Zinc finger binding domains for GNN

ABSTRACT

Zinc finger-nucleotide binding polypeptides having binding specificity for target nucleotides containing one or GNN triplets are provided. Compositions containing such polypeptides and the use of such polypeptides and compositions for regulating gene expression are also provided.

CROSS-REFERENCES

This application is a divisional of application Ser. No. 10/646,919 byBarbas, filed on Aug. 21, 2003, which is a continuation of applicationSer. No. 09/494,150 filed on Jan. 28, 2000, now U.S. Pat. No. 6,610,512,which is a continuation-in-part of PCT Patent Application Serial No.PCT/EP99/07742, filed on Oct. 14, 1999, which is a continuation-in-partof application Ser. No. 09/173,941, filed on Oct. 16, 1998, now U.S.Pat. No. 6,140,081.

TECHNICAL FIELD OF THE INVENTION

The field of this invention is zinc finger protein binding to targetnucleotides. More particularly, the present invention pertains to aminoacid residue sequences within the a-helical domain of zinc fingers thatspecifically bind to target nucleotides of the formula 5′-(GNN)-3′.

BACKGROUND OF THE INVENTION

The paradigm that the primary mechanism for governing the expression ofgenes involves protein switches that bind DNA in a sequence specificmanner was established in 1967 (Ptashne, M. (1967) Nature (London) 214,323-4). Diverse structural families of DNA binding proteins have beendescribed. Despite a wealth of structural diversity, the Cys₂-His₂ zincfinger motif constitutes the most frequently utilized nucleic acidbinding motif in eukaryotes. This observation is as true for yeast as itis for man. The Cys₂-His₂ zinc finger motif, identified first in the DNAand RNA binding transcription factor TFIIIA (Miller, J., McLachlan, A.D. & Klug, A. (1985) Embo J 4, 1609-14), is perhaps the ideal structuralscaffold on which a sequence specific protein might be constructed. Asingle zinc finger domain consists of approximately 30 amino acids witha simple tua fold stabilized by hydrophobic interactions and thechelation of a single zinc ion (Miller, J., McLachlan, A. D. & Klug, A.(1985) Embo J 4, 1609-14, Lee, M. S., Gippert, G. P., Soman, K. V.,Case, D. A. & Wright, P. E. (1989) Science 245, 635-7). Presentation ofthe .alpha.-helix of this domain into the major groove of DNA allows forsequence specific base contacts. Each zinc finger domain typicallyrecognizes three base pairs of DNA (Pavietich, N. P. & Pabo, C. O.(1991) Science (Washington, D.C., 1883-) 252, 809-17, Elrod-Erickson,M., Rould, M. A., Nekludova, L. & Pabo, C. O. (1996) Structure (London)4, 1171-1180, Elrod-Erickson, M., Benson, T. E. & Pabo, C. O. (1998)Structure (London) 6, 451464, Kim, C. A. & Berg, J. M. (1996) NatureStructural Biology 3, 940-945), though variation in helical presentationcan allow for recognition of a more extended site (Pavletich, N. P. &Pabo, C. O. (1993) Science (Washington, D.C., 1883-) 261, 1701-7,Houbaviy, H. B., Usheva, A., Shenk, T. & Burley, S. K. (1996) Proc NatlAcad Sci U S A 93,13577-82, Fairall, L., Schwabe, J. W. R., Chapman, L.,Finch, J. T. & Rhodes, D. (1993) Nature (London) 366,483-7, Wuttke, D.S., Foster, M. P., Case, D. A., Goftesfeld, J. M. & Wright, P. E. (1997)J. Mol. Biol. 273, 183-206). In contrast to most transcription factorsthat rely on dimerization of protein domains for extending protein-DNAcontacts to longer DNA sequences or addresses, simple covalent tandemrepeats of the zinc finger domain allow for the recognition of longerasymmetric sequences of DNA by this motif. We have recently describedpolydactyl zinc finger proteins that contain 6 zinc finger domains andbind 18 base pairs of contiguous DNA sequence (Liu, Q., Segal, D. J.,Ghiara, J. B. & Barbas III, C. F. (1997) PNAS 94, 5525-5530).Recognition of 18 bps of DNA is sufficient to describe a unique DNAaddress within all known genomes, a requirement for using polydactylproteins as highly specific gene switches. Indeed, control of both geneactivation and repression has been shown using these polydactyl proteinsin a model system (Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas III, C.F. (1997) PNAS 94, 5525-5530).

Since each zinc finger domain typically binds three base pairs ofsequence, a complete recognition alphabet requires the characterizationof 64 domains. Existing information which could guide the constructionof these domains has come from three types of studies: structuredetermination (Pavietich, N. P. & Pabo, C. O. (1991) Science(Washington, D.C., 1883) 252, 809-17, Elrod-Erickson, M., Rould, M. A.,Nekludova, L. & Pabo, C. O. (1996) Structure (London) 4, 1171-1180,Elrod-Erickson, M., Benson, T. E. & Pabo, C. O. (1998) Structure(London) 6, 451-464, Kim, C. A. & Berg, J. M. (1996) Nature StructuralBiology 3, 940-945, Pavletich, N. P. & Pabo, C. O. (1993) Science(Washington, D.C., 1883-) 261, 1701-7, Houbaviy, H. B., Usheva, A.,Shenk, T. & Burley, S. K. (1996) Proc Natl Acad Sci U S A 93, 13577-82,Fairall, L., Schwabe, J. W. R., Chapman, L., Finch, J. T. & Rhodes, D.(1993) Nature (London) 366, 483-7, 11, Wuttke, D. S., Foster, M. P.,Case, D. A., Gottesfeld, J. M. & Wright, P. E. (1997) J. Mol. Biol. 273,183-206, Nolte, R. T., Conlin, R. M., Harrison, S. C. & Brown, R. S.(1998) Proc. Natl. Acad. Sci. U. S. A. 95,2938-2943, Narayan, V. A.,Kriwacki, R. W. & Caradonna, J. P. (1997) J. Biol. Chem. 272, 7801-7809,site-directed mutagenesis (Isalan, M., Choo, Y. & Klug, A. (1997) Proc.Natl. Acad. Sci. U. S. A. 94, 5617-5621, Nardelli, J., Gibson, T. J.,Vesque, C. & Chamay, P. (1991) Nature 349, 175-178, Nardelli, J.,Gibson, T. & Charnay, P. (1992) Nucleic Acids Res. 20, 413744, Taylor,W. E., Suruki, H. K., Lin, A. H. T., Naraghi-Arani, P., Igarashi, R. Y.,Younessian, M., Katkus, P. & Vo, N. V. (1995) Biochemistry 34,3222-3230, Desjarlais, J. R. & Berg, J. M. (1992) Proteins: Struct.,Funct., Genet. 12, 1014, Desjarlais, J. R. & Berg, J. M. (1992) ProcNatl Acad Sci U S A 89, 7345-9), and phage-display selections (Choo, Y.& Klug, A. (1994) Proc Natl Acad Sci U S A 91, 11163-7, Greisman, H. A.& Pabo, C. O. (1997) Science (Washington, D.C.) 275,657-661.23, Rebar,E. J. & Pabo, C. O. (1994) Science (Washington, D.C., 1883-) 263, 671-3,Jamieson, A. C., Kim, S.-H. & Wells, J. A. (1994) Biochemistry 33,5689-5695, Jamieson, A. C., Wang, H. & Kim, S.-H. (1996) PNAS 93,12834-12839, Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37,12026-33, Wu, H., Yang, W.-P. & Barbas III, C. F. (1995) PNAS 92,344-348). All have contributed significantly to our understanding ofzinc finger/DNA recognition, but each has its limitations. Structuralstudies have identified a diverse spectrum of protein/DNA interactionsbut do not explain if alternative interactions might be more optimal.Further, while interactions that allow for sequence specific recognitionare observed, little information is provided on how alternate sequencesare excluded from binding. These questions have been partially addressedby mutagenesis of existing proteins, but the data is always limited bythe number of mutants that can be characterized. Phage-display andselection of randomized libraries overcomes certain numericallimitations, but providing the appropriate selective pressure to ensurethat both specificity and affinity drive the selection is difficult.Experimental studies from several laboratories (Choo, Y. & Klug, A.(1994) Proc Natl Acad Sci U S A 91, 11163-7, Greisman, H. A. & Pabo, C.O. (1997) Science (Washington, D.C.) 275, 657-661, Rebar, E. J. & Pabo,C. O. (1994) Science (Washington, D.C., 1883-) 263, 671-3, Jamieson, A.C., Kim, S.-H. & Wells, J. A. (1994) Biochemistry 33, 5689-5695.25,Jamieson, A. C., Wang, H. & Kim, S.-H. (1996) PNAS 93, 12834-12839,Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37, 12026-33),including our own (Wu, H., Yang, W.-P. & Barbas III, C. F. (1995) PNAS92, 344-348), have demonstrated that it is possible to design or selecta few members of this recognition alphabet. However, the specificity andaffinity of these domains for their target DNA was rarely investigatedin a rigorous and systematic fashion in these early studies.

Since Jacob and Monod questioned the chemical nature of the repressorand proposed a scheme by which the synthesis of individual proteinswithin a cell might be provoked or repressed, specific experimentalcontrol of gene expression has been a tantalizing prospect (Jacob, F. &Monod, J. (1961) J. Mol. Biol. 3, 318-356). It is now well establishedthat genomes are regulated at the level of transcription primarilythrough the action of proteins known as transcription factors that bindDNA in a sequence specific fashion. Often these protein factors act in acomplex combinatorial manner allowing temporal, spatial, andenvironmentally-responsive control of gene expression (Ptashne, M.(1997) Nature Medicine 3, 1069-1072). Transcription factors frequentlyact both through a DNA-binding domain which localizes the protein to aspecific site within the genome, and through accessory effector domainswhich act to provoke (activate) or repress transcription at or near thatsite (Cowell, I. G. (1994) Trends Biochem. Sci. 19, 3842). Effectordomains, such as the activation domain VP16 (Sadowski, I., Ma, J.,Triezenberg, S. & Ptashne, M. (1988) Nature 335, 563-564) and therepression domain KRAB (Margolin, J. F., Friedman, J. R., Meyer, W.,K.-H., Vissing, H., Thiesen, H.-J. & Rauscher III, F. J. (1994) Proc.Nat. Acad. Sci. USA 91, 4509-4513), are typically modular and retaintheir activity when they are fused to other DNA-binding proteins.Whereas genes might be readily controlled by directing transcriptionfactors to particular sites within a genome, the design of DNA bindingproteins that might be fashioned to bind any given sequence has been adaunting challenge. The present disclosure is based on the recognitionof the structural features unique to the Cys₂-His₂ class of nucleicacid-binding, zinc finger proteins. The Cys₂-His₂ zinc finger domainconsists of a simple Ada fold of approximately 30 amino acids in length.Structural stability of this fold is achieved by hydrophobicinteractions and by chelation of a single zinc ion by the conservedCys₂-His₂ residues (Lee, M. S., Gippert, G. P., Soman, K. V., Case, D.A. & Wright, P. E. (1989) Science 245, 635-637). Nucleic acidrecognition is achieved through specific amino acid side chain contactsoriginating from the α-helix of the domain, which typically binds threebase pairs of DNA sequence (Pavletich, N. P. & Pabo, C. O. (1991)Science 252, 809-17, Elrod-Erickson, M., Rould, M. A., Nekludova, L. &Pabo, C. O. (1996) Structure 4, 1171-1180). Unlike other nucleic acidrecognition motifs, simple covalent linkage of multiple zinc fingerdomains allows the recognition of extended asymmetric sequences of DNA.Studies of natural zinc finger proteins have shown that three zincfinger domains can bind 9 bp of contiguous DNA sequence (Pavietich, N.P. & Pabo, C. O. (1991) Science 252, 809-17, Swirnoff, A. H. &Milbrandt, J. (1995) Mol. Cell. Biol. 15, 2275-87). Whereas recognitionof 9 bp of sequence is insufficient to specify a unique site within eventhe small genome of E. coli, polydactyl proteins containing six zincfinger domains can specify 18-bp recognition (Liu, Q., Segal, D. J.,Ghiara, J. B. & Barbas III, C. F. (1997) Proc. Natl. Acad. Sci. USA 94,5525-5530). With respect to the development of a universal system forgene control, an 18-bp address can be sufficient to specify a singlesite within all known genomes. While polydactyl proteins of this typeare unknown in nature, however, their efficacy in gene activation andrepression within living human cells has recently been shown (Liu, Q.,Segal, D. J., Ghiara, J. B. & Barbas III, C. F. (1997) Proc. Natl. Acad.Sci. USA 94,5525-5530).

BRIEF SUMMARY OF THE INVENTION

In one aspect, the present invention provides an isolated and purifiedzinc finger-nucleotide binding polypeptide that contains the amino acidresidue sequence of any of SEQ ID NO:1-16. In a related aspect, thisinvention further provides compositions comprising from two to about 12such zinc finger-nucleotide binding polypeptides. The compositionpreferably contains from 2 to about 6 polypeptides. In a preferredembodiment, the zinc finger-nucleotide binding polypeptides areoperatively linked, preferably by an amino acid residue linker havingthe sequence of SEQ ID NO 111. A composition of this inventionspecifically binds a nucleotide target that contains the sequence5′-(GNN)_(n)-3′, wherein each N is A, C, G, or T with the proviso thatall N's cannot be C and where n is preferably 2 to 6. A polypeptide orcomposition can be further operatively linked to one or moretranscription modulating factors such as a transcription activators ortranscription suppressors or repressors. The present invention alsoprovides an isolated and purified polynucleotide that encodes apolypeptide or composition of this invention and an expression vectorcontaining such a polynucleotide.

In a still further aspect, the present invention provides a process ofregulating the function of a nucleotide sequence that contains thesequence 5′-(GNN)_(n)-3′, where n is an integer from 1 to 6, the processcomprising exposing the nucleotide sequence to an effective amount of acomposition of this invention operatively linked to one or moretranscription modulating factors. The 5′-(GNN)_(n)-3′ sequence can befound in the transcribed region or promoter region of the nucleotide orwithin an expressed sequence tag. In a preferred embodiment, thenucleotide sequence is part of an oncogene sequence. More preferably,the target nucleotide sequence is contained in a gene that encodes amember of an erbB receptor family. More preferably, the targetnucleotide sequence is contained in an erbB gene. Preferred erbB genesare the human erbB-2 and erbB-3 genes.

The present disclosure demonstrates the simplicity and efficacy of ageneral strategy for the rapid production of gene switches. With afamily of defined zinc finger domains recognizing sequences of the5′-GNN-3′ subset of a 64 member zinc finger alphabet, polydactylproteins specifically recognizing novel 9- or, for the first time, 18-bpsequences were constructed and characterized. Potent transcriptionfactors were generated and shown to control both gene activation andrepression. Gene activation was achieved using the herpes simplex virusVP16 activation domain (Sadowski, I., Ma, J., Triezenberg, S. & Ptashne,M. (1988) Nature 335, 563-564) and a recombinant tetrameric repeat ofits minimal activation domain. Gene repression or silencing was achievedusing three effector domains of human origin, the kruppel associated box(KRAB) (Margolin, J. F., Friedman, J. R., Meyer, W., K.-H., Vissing, H.,Thiesen, H.-J. & Rauscher III, F. J. (1994) Proc. Natl. Acad. Sci. USA91, 45094513), the ERF repressor domain (ERD) (Sgouras, D. N.,Athanasiou, M. A., Beal, G. J., Jr., Fisher, R. J., Blair, D. G. &Mavrothalassitis, G. J. (1995) EMBO J. 14, 4781-4793), and the mSIN3interaction domain (SID) (Ayer, D. E., Laherty, C. D., Lawrence, Q. A.,Armstrong, A. P. & Eisenman, R. N. (1996) Mol. Cell. Biol. 16,5772-5781). Using luciferase reporter gene assays in human epithelialcells, the data show that artificial transcriptional regulators,designed to target the promoter of the proto-oncogene erbB-21HER-2, canablate or activate gene expression in a specific manner. For the firsttime, gene activation or repression was achieved by targeting within thegene transcript, suggesting that information obtained from expressedsequence tags (ESTs) may be sufficient for the construction of geneswitches. The novel methodology and materials described herein promisediverse applications in gene therapy, transgenic organisms, functionalgenomics, and other areas of cell and molecular biology.

BRIEF DESCRIPTION OF THE DRAWING

In the drawing, which forms a portion of the specification

FIG. 1 (shown in six panels) shows the binding specificity of regions ofzinc finger-nucleotide binding polypeptides of the invention.

FIG. 2 shows (A) Alignment of E2C target sequence in the erbB-2 5′-UTRwith the E3 target sequence in the erbB-3 5′-UTR. Numbers indicate thedistance from the ATG translation initiation codon. (B) Amino acidsequence alignment of E2C and E3 proteins. DNA recognition helixsequence positions-1 to 6 of each finger, as well as sequencedifferences, are boxed.

DETAILED DESCRIPTION OF THE INVENTION

I. The Invention

The present invention provides zinc finger-nucleotide bindingpolypeptides, compositions containing one or more such polypeptides andthe use of the polypeptides and compositions for modulating geneexpression.

II. Compounds

A compound of this invention is an isolated zinc finger-nucleotidebinding polypeptide that binds to a GNN nucleotide sequence andmodulates the function of that nucleotide sequence. The polypeptide canenhance or suppress transcription of a gene, and can bind to DNA or RNA.A zinc finger-nucleotide binding polypeptide refers to a polypeptidewhich is a derivatized form of a wild-type zinc finger protein or oneproduced through recombination. A polypeptide may be a hybrid whichcontains zinc finger domain(s) from one protein linked to zinc fingerdomain(s) of a second protein, for example. The domains may be wild typeor mutagenized. A polypeptide includes a truncated form of a wild typezinc finger protein. Examples of zinc finger proteins from which apolypeptide can be produced include TFIIIA and zif268.

A zinc finger-nucleotide binding polypeptide of this invention comprisesa unique heptamer (contiguous sequence of 7 amino acid residues) withinthe α-helical domain of the polypeptide, which heptameric sequencedetermines binding specificity to a target necleotide. That heptamericsequence can be located anywhere within the α-helical domain but it ispreferred that the heptamer extend from position-1 to position 6 as theresidues are conventionally numbered in the art. A polypeptide of thisinvention can include any β-sheet and framework sequences known in theart to function as part of a zinc finger protein. A large number of zincfinger-nucleotide binding polypeptides were made and tested for bindingspecificity against target nucleotides containing a GNN triplet. Theresults of those studies are summarized in FIG. 1. In FIG. 1, the GNNtriplet binding specificity for each peptide is shown in the right-handcolumn, with the highest specificity shown first and in boldface. InFIG. 1, SEQ ID Nos: are shown in parentheses. For each particular GNN(e.g., GAA, shown in the right-hand column of FIG. 1) target, thesequences are listed in order of decreasing specificity for thatTriplet.

As shown in FIG. 1, the data show a striking conservation of all threeof the primary DNA contact positions (−1, 3, and 6) was observed forvirtually all the clones of a given target. Although many of theseresidues were observed previously at these positions followingselections with much less complete libraries, the extent of conservationobserved here represents a dramatic improvement over earlier studies(Choo, Y. & Klug, A. (1994) Proc Natl Acad Sci U S A 91, 11163-7,Greisman, H. A. & Pabo, C. O. (1997) Science (Washington, D.C.) 275,657-661, Rebar, E. J. & Pabo, C. O. (1994) Science (Washington, D.C.,1883-) 263, 671-3, Jamieson, A. C., Kim, S.-H. & Wells, J. A. (1994)Biochemistry 33, 5689-5695, Jamieson, A. C., Wang, H. & Kim, S.-H.(1996) PNAS 93, 12834-12839, Wu, H., Yang, W.-P. & Barbas III, C. F.(1995) PNAS 92, 344-348). The present invention discloses that theteachings of the prior art that the three helical positions-1, 3, and 6of a zinc finger domain are sufficient to allow for the detaileddescription of the DNA binding specificity of the domain are incorrect.

Typically, phage selections have shown a consensus selection in only oneor two of these positions. The greatest sequence variation occurred atthe residues in positions 1 and 5, which do not make bases contacts inthe Zif268/DNA structure and were expected not to contributesignificantly to recognition (Pavletich, N. P. & Pabo, C. O. (1991)Science (Washington, D.C., 1883-) 252, 809-17, Elrod-Erickson, M.,Rould, M. A., Nekludova, L. & Pabo, C. O. (1996) Structure (London) 4,1171-1180). Variation in positions 1 and 5 also implied that theconservation in the other positions was due to their interaction withthe DNA and not simply the fortuitous amplification of a single clonedue to other reasons. Conservation of residue identity at position 2 wasalso observed. The conservation of position −2 is somewhat artifactual;the NNK library had this residue fixed as serine. This residue makescontacts with the DNA backbone in the Zif268 structure. Both librariescontained an invariant leucine at position 4, a critical residue in thehydrophobic core that stabilizes folding of this domain.

Impressive amino acid conservation was observed for recognition of thesame nucleotide in different targets. For example, Asn in position 3(Asn3) was virtually always selected to recognize adenine in the middleposition, whether in the context of GAG, GM, GAT, or GAC. Gln-1 andArg-1 were always selected to recognize adenine or guanine,respectively, in the 3′ position regardless of context. Amide side chainbased recognition of adenine by Gln or Asn is well documented instructural studies as is the Arg guanidinium side chain to guaninecontact with a 3′ or 5′ guanine (Elrod-Erickson, M., Benson, T. E. &Pabo, C. O. (1998) Structure (London) 6, 451-464, Kim, C. A. & Berg, J.M. (1996) Nature Structural Biology 3, 940-945., Fairall, L., Schwabe,J. W. R., Chapman, L., Finch, J. T. & Rhodes, D. (1993) Nature (London)366, 483-7). More often, however, two or three amino acids were selectedfor nucleotide recognition. His3 or Lys3 (and to a lesser extent, Gly3)were selected for the recognition of a middle guanine. Ser3 and Ala3were selected to recognize a middle thymine. Thr3, Asp3, and Glu3 wereselected to recognize a middle cytosine. Asp and Glu were also selectedin position-1 to recognize a 3′ cytosine, while Thr-1 and Ser-1 wereselected to recognize a 3′ thymine.

Selected Zif268 variants were subcloned into a bacterial expressionvector, and the proteins overexpressed (finger-2 proteins, hereafterreferred to by the subsite for which they were panned). It is importantto study soluble proteins rather than phage-fusions since it is knownthat the two may differ significantly in their binding characteristics(Crameri, A., Cwirla, S. & Stemmer, W. P. (1996) Nat Med. 2, 100-102).The proteins were tested for their ability to recognize each of the 165′-GNN-3′ finger-2 subsites using a multi-target ELISA assay. This assayprovided an extremely rigorous test for specificity since there werealways six “non-specific” sites which differed from the “specific” siteby only a single nucleotide out of a nine-nucleotide target. Many of thephage-selected finger-2 proteins showed exquisite specificity, whileothers demonstrated varying degrees of crossreactivity. Somepolypeptides actually bound better to subsites other than those forwhich they were selected.

Attempts were made to improve binding specificity by modifying therecognition helix using site-directed mutagenesis. Data from ourselections and structural information guided mutant design. As the mostexhaustive study performed to date, over 100 mutant proteins werecharacterized in an effort to expand our understanding of the rules ofrecognition. Although helix positions 1 and 5 are not expected to play adirect role in DNA recognition, the best improvements in specificityalways involved modifications in these positions. These residues havebeen observed to make phosphate backbone contacts, which contribute toaffinity in a non-sequence specific manner. Removal of non-specificcontacts increases the importance of the specific contacts to theoverall stability of the complex, thereby enhancing specificity. Forexample, the specificity of polypeptides for target triplets GAC, GM,and GAG were improved simply by replacing atypical, charged residues inpositions 1 and 5 with smaller, uncharged residues.

Another class of modifications involved changes to both binding andnon-binding residues. The crossreactivity of polypeptides for GGG andthe finger-2 subsite GAG was abolished by the modifications His3Lys andThr5Val. It is interesting to note that His3 was unanimously selectedduring panning to recognize the middle guanine, although Lys3 providedbetter discrimination of A and G. This suggests that panning conditionsfor this protein may have favored selection by a parameter such asaffinity over that of specificity. In the Zif268 structure, His3 donatesa hydrogen bond to the N7 of the middle guanine (Pavletich, N. P. &Pabo, C. O. (1991) Science (Washington, D.C., 1883-) 252, 809-17,Elrod-Erickson, M., Rould, M. A., Nekludova, L. & Pabo, C. O. (1996)Structure (London) 4, 1171-1180). This bond could also be made with N7of adenine, and in fact Zif268 does not discriminate between G and A inthis position (Swimoff, A. H. & Milbrandt, J. (1995) Mol. Cell. Biol.15, 2275-87). His3 was found to specify only a middle guanine inpolypeptides targeted to GGA, GGC, and GGT, even though Lys3 wasselected during panning for GGC and GGT. Similarly, the multiplecrossreactivities of polypeptides targeted to GTG were attenuated bymodifications LysiSer and Ser3Glu, resulting in a 5-fold loss inaffinity. Glu3 has been shown to be very specific for cytosine inbinding site selection studies of Zif268 (Swirnoff, A. H. & Milbrandt,J. (1995) Mol. Cell. Biol. 15, 2275-87). No structural studies show aninteraction of Glu3 with the middle thymine, and Glu3 was never selectedto recognize a middle thymine in our study or any others (Choo, Y. &Klug, A. (1994) Proc Natl Acad Sci U S A 91, 11163-7, Greisman, H. A. &Pabo, C. O. (1997) Science (Washington, D.C.) 275, 657-661, Rebar, E. J.& Pabo, C. O. (1994) Science (Washington, D.C., 1883-) 263, 671-3,Jamieson, A. C., Kim, S.-H. & Wells, J. A. (1994) Biochemistry 33,5689-5695, Jamieson, A. C., Wang, H. & Kim, S.-H. (1996) PNAS 93,12834-12839, Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37,12026-33, Wu, H., Yang, W.-P. & Barbas III, C. F. (1995) PNAS 92,344-348). Despite this, the Ser3Glu modification favored the recognitionof a middle thymine over cytosine. These examples illustrate thelimitations of relying on previous structures and selection data tounderstand the structural elements underlying specificity. It shouldalso be emphasized that improvements by modifications involvingpositions 1 and 5 could not have been predicted by existing“recognition. codes” (Desjarlais, J. R. & Berg, J. M. (1992) Proc NatlAcad Sci U S A 89, 7345-9.Suzuki, M., Gerstein, M. & Yagi, N. (1994)Nucleic Acids Res. 22, 3397-405, Choo, Y. & Klug, A. (1994) Proc. Natl.Acad. Sci. U. S. A. 91, 11168-72, Choo, Y. & Klug, A. (1997) Curr. Opin.Struct. Biol. 7, 117-125), which typically only consider positions −1,2, 3, and 6. Only by the combination of selection and site-directedmutagenesis can we begin to fully understand the intricacies of zincfinger/DNA recognition.

From the combined selection and mutagenesis data it emerged thatspecific recognition of many nucleotides could be best accomplishedusing motifs, rather than a single amino acid. For example, the bestspecification of a 3′ guanine was achieved using the combination ofArg-1, Ser1, and Asp2 (the RSD motif. By using Val5 and Arg6 to specifya 5′ guanine, recognition of subsites GGG, GAG, GTG, and GCG could beaccomplished using a common helix structure (SRSD-X-LVR) differing onlyin the position 3 residue (Lys3 for GGG, Asn3 for GAG, Glu3 for GTG, andAsp3 for GCG). Similarly, 3′ thymine was specified using Thr-1, Ser1,and Gly2 in the final clones(the TSG motif). Further, a 3′ cytosinecould be specified using Asp-1, Pro1, and Gly2 (the DPG motif) exceptwhen the subsite was GCC; Prol was not tolerated by this subsite.Specification of a 3′ adenine was with Gin-1, Ser1, Ser2 in two clones(QSS motif). Residues of positions 1 and 2 of the motifs were studiedfor each of the 3′ bases and found to provide optimal specificity for agiven 3′ base as described here.

The multi-target ELISA assay assumed that all the proteins preferredguanine in the 5′ position since all proteins contained Arg6 and thisresidue is known from structural studies to contact guanine at thisposition (Pavletich, N. P. & Pabo, C. O. (1991) Science (Washington,D.C., 1883-) 252, 809-17, Elrod-Erickson, M., Rould, M. A., Nekludova,L. & Pabo, C. O. (1996) Structure (London) 4, 1171-1180, Elrod-Erickson,M., Benson, T. E. & Pabo, C. O. (1998) Structure (London) 6, 451-464,Kim, C. A. & Berg, J. M. (1996) Nature Structural Biology 3, 940-945,Pavletich, N. P. & Pabo, C. O. (1993) Science (Washington, D.C., 1883-)261, 1701-7, Houbaviy, H. B., Usheva, A., Shenk, T. & Burley, S. K.(1996) Proc Nat Acad Sci U S A 93, 13577-82, Fairall, L., Schwabe, J. W.R., Chapman, L., Finch, J. T. & Rhodes, D. (1993) Nature (London) 366,483-7, Wuttke, D. S., Foster, M. P., Case, D. A., Gottesfeld, J. M. &Wright, P. E. (1997) J. Mol. Biol. 273, 183-206, Nolte, R. T., Conlin,R. M., Harrison, S. C. & Brown, R. S. (1998) Proc. Natl. Acad. Sci. U.S. A. 95, 2938-2943). This interaction was demonstrated here using the5′ binding site signature assay ((Choo, Y. & Klug, A. (1994) Proc. Natl.Acad. Sci. U. S. A. 91, 11168-72); FIG. 2, white bars). Each protein wasapplied to pools of 16 oligonucleotide targets in which the 5′nucleotide of the finger-2 subsite was fixed as G, A, T, or C and themiddle and 3′ nucleotides were randomized. All proteins preferred theGNN pool with essentially no cross reactivity.

The results of the multi-target ELISA assay were confirmed by affinitystudies of purified proteins. In cases where crossreactivity was minimalin the ELISA assay, a single nucleotide mismatch typically resulted in agreater than 100-fold loss in affinity. This degree of specificity hadyet to be demonstrated with zinc finger proteins. In general, proteinsselected or designed to bind subsites with G or A in the middle and 3′position had the highest affinity, followed by those which had only oneG or A in the middle or 3′ position, followed by those which containedonly T or C. The former group typically bound their targets with ahigher affinity than Zif268 (10 nM), the latter with somewhat loweraffinity, and almost all the proteins had an affinity lower than that ofthe parental C7 protein. There was no correlation between bindingaffinity and binding specificity suggesting that specificity can resultnot only from specific protein-DNA contacts, but also from interactionswhich exclude all but the correct nucleotide.

Asp2 was always co-selected with Arg-1 in all proteins for which thetarget subsite was GNG. It is now understood that there are two reasonsfor this. From structural studies of Zif268 (Pavletich, N. P. & Pabo, C.O. (1991) Science (Washington, D.C., 1883-) 252, 809-17, Elrod-Erickson,M., Rould, M. A., Nekludova, L. & Pabo, C. O. (1996) Structure (London)4, 1171-1180), it is known that Asp2 of finger 2 makes a pair ofbuttressing hydrogen bonds with Arg-1 which stabilize the Arg-1/3′guanine interaction, as well as some water-mediated contacts. However,the carboxylate of Asp2 also accepts a hydrogen bond from the N4 of acytosine that is base-paired to a 5′ guanine of the finger-1 subsite.Adenine base paired to T in this position can make an analogous contactto that seen with cytosine. This interaction is particularly importantbecause it extends the recognition subsite of finger 2 from threenucleotides (GNG) to four (GNG(G/T) (Isalan, M., Choo, Y. & Klug, A.(1997) Proc. Nat. Acad. Sci. U. S. A. 94, 5617-5621., Jamieson, A. C.,Wang, H. & Kim, S.-H. (1996) PNAS 93, 12834-12839, Isalan, M., Klug, A.& Choo, Y. (1998) Biochemistry 37, 12026-33). This phenomenon isreferred to as “target site overlap”, and has three importantramifications. First, Asp2 was favored for selection by our library whenthe finger-2 subsite was GNG because our finger-1 subsite contained a 5′guanine. Second, it may limit the utility of the libraries used in thisstudy to selection on GNN or TNN finger-2 subsites because finger 3 ofthese libraries contains an Asp2, which may help specify the 5′nucleotide of the finger-2 subsite to be G or T. In Zif268 and C7, whichhave Thr6 in finger 2, Asp2 of finger 3 enforces G or T recognition inthe 5′ position (T/G)GG. This interaction may also explain why previousphage display studies, which all used Zif268-based libraries, have foundselection limited primarily to GNN recognition (Choo, Y. & Klug, A.(1994) Proc Natl Acad Sci U S A 91, 11163-7, Rebar, E. J. & Pabo, C. O.(1994) Science (Washington, D.C., 1883-) 263, 671-3, Jamieson, A. C.,Kim, S.-H. & Wells, J. A. (1994) Biochemistry 33, 5689-5695, Jamieson,A. C., Wang, H. & Kim, S.-H. (1996) PNAS 93, 12834-12839, Isalan, M.,Klug, A. & Choo, Y. (1998) Biochemistry 37, 12026-33, Wu, H., Yang,W.-P. & Barbas III, C. F. (1995) PNAS 92, 344-348).

Finally, target site overlap potentially limits the use of these zincfingers as modular building blocks. From structural data it is knownthat there are some zinc fingers in which target site overlap is quiteextensive, such as those in GL1 and YY1, and others which are similar toZif268 and display only modest overlap. In our final set of proteins,Asp2 is found in polypeptides that bind GGG, GAG, GTG, and GCG. Theoverlap potential of other residues found at position 2 is largelyunknown, however structural studies reveal that many other residuesfound at this position may participate in such cross-subsite contacts.Fingers containing Asp2 may limit modularity, since they would requirethat each GNG subsite be followed by a T or G.

Table 1, below, summarized the sequences (SEQ ID NOs:1-16) showing thehighest selectivity for the sixteen embodiment of GNN target triplets.

TABLE 1 Target amino acids positions Specificity -1 1 2 3 4 5 6 SEQ IDNO: GAA Q S S N L V R 1 GAC D P G N L V R 2 GAG R S D N L V R 3 GAT T SG N L V R 4 GCA Q S G D L R R 5 GCC D C R D L A R 6 GCG R S D D L V K 7GCT T S G E L V R 8 GGA Q R A H L E R 9 GGC D P G H L V R 10 GGG R S D KL V R 11 GGT T S G H L V R 12 GTA Q S S S L V R 13 GTC D P G A L V R 14GTG R S D E L V R 15 GTT T S G S L V R 16

The data show that all possible GNN triplet sequences can be recognizedwith exquisite specificity by zinc finger domains. Optimized zinc fingerdomains can discriminate single base differences by greater than100-fold loss in affinity. While many of the amino acids found in theoptimized proteins at the key contact positions −1, 3, and 6 are thosethat are consistent with a simple code of recognition, it has beendiscovered that optimal specific recognition is sensitive to the contextin which these residues are presented. Residues at positions 1, 2, and 5have been found to be critical for specific recognition. Further thedata demonstrates for the first time that sequence motifs at positions−1, 1, and 2 rather than the simple identity of the position 1 residueare required for highly specific recognition of the 3′ base. Theseresidues likely provide the proper stereochemical context forinteractions of the helix both in terms of recognition of specific basesand in the exclusion of other bases, the net result being highlyspecific interactions. Broad utility of these domains would be realizedif they were modular in both their interactions with DNA and other zincfinger domains. This could be achieved by working within the likelylimitations imposed by target site overlap, namely that sequences of the5′-(GNN)_(n)-3′ type should be targeted. Ready recombination of thedisclosed domains then allows for the creation of polydactyl proteins ofdefined specificity precluding the need to develop phage displaylibraries in their generation. These polydactyl proteins have been usedto activate and repress transcription driven by the human erbB-2promoter in living cells. The family of zinc finger domains describedherein is likely sufficient for the construction of 16⁶ or 17 millionnovel proteins that bind the 5′-(GNN)₆-3′ family of DNA sequences.

The zinc finger-nucleotide binding polypeptide derivative can be derivedor produced from a wild type zinc finger protein by truncation orexpansion, or as a variant of the wild type-derived polypeptide by aprocess of site directed mutagenesis, or by a combination of theprocedures. The term “truncated” refers to a zinc finger-nucleotidebinding polypeptide that contains less that the full number of zincfingers found in the native zinc finger binding protein or that has beendeleted of non-desired sequences. For example, truncation of the zincfinger-nucleotide binding protein TFIIIA, which naturally contains ninezinc fingers, might be a polypeptide with only zinc fingers one throughthree. Expansion refers to a zinc finger polypeptide to which additionalzinc finger modules have been added. For example, TFIIIA may be extendedto 12 fingers by adding 3 zinc finger domains. In addition, a truncatedzinc finger-nucleotide binding polypeptide may include zinc fingermodules from more than one wild type polypeptide, thus resulting in a“hybrid” zinc finger-nucleotide binding polypeptide.

The term “mutagenized” refers to a zinc finger derived-nucleotidebinding polypeptide that has been obtained by performing any of theknown methods for accomplishing random or site-directed mutagenesis ofthe DNA encoding the protein. For instance, in TFIIIA, mutagenesis canbe performed to replace nonconserved residues in one or more of therepeats of the consensus sequence. Truncated zinc finger-nucleotidebinding proteins can also be mutagenized.

Examples of known zinc finger-nucleotide binding polypeptides that canbe truncated, expanded, and/or mutagenized according to the presentinvention in order to inhibit the function of a nucleotide sequencecontaining a zinc finger-nucleotide binding motif includes TFIIIA andzif268. Other zinc finger-nucleotide binding proteins will be known tothose of skill in the art.

A polypeptide of this invention can be made using a variety of standardtechniques well known in the art (See, e.g., U.S. patent applicationSer. No. 08/676,318, filed Jan. 18, 1995, the entire disclosure of whichis incorporated herein by reference). Phage display libraries of zincfinger proteins were created and selected under conditions that favoredenrichment of sequence specific proteins. Zinc finger domainsrecognizing a number of sequences required refinement by site-directedmutagenesis that was guided by both phage selection data and structuralinformation.

The murine Cys₂-His₂ zinc finger protein Zif268 is used for constructionof phage display libraries (Wu, H., Yang, W.-P. & Barbas III, C. F.(1995) PNAS 92, 344-348). Zif268 is structurally the most wellcharacterized of the zinc-finger proteins (Pavletich, N. P. & Pabo, C.O. (1991) Science (Washington, D.C., 1883-) 252, 809-17, Elrod-Erickson,M., Rould, M. A., Nekludova, L. & Pabo, C. O. (1996) Structure (London)4, 1171-1180, Swirnoff, A. H. & Milbrandt, J. (1995) Mol. Cell. Biol.15, 2275-87). DNA recognition in each of the three zinc finger domainsof this protein is mediated by residues in the N-terminus of the α-helixcontacting primarily three nucleotides on a single strand of the DNA.The operator binding site for this three finger protein is5′-GCGTGGGCG-3′ (finger-2 subsite is underlined). Structural studies ofZif268 and other related zinc finger-DNA complexes (Elrod-Erickson, M.,Benson, T. E. & Pabo, C. O. (1998) Structure (London) 6, 451-464, Kim,C. A. & Berg, J. M. (1996) Nature Structural Biology 3, 940-945,Pavletich, N. P. & Pabo, C. O. (1993) Science (Washington, D.C., 1883-)261, 1701-7, Houbaviy, H. B., Usheva, A., Shenk, T. & Burley, S. K.(1996) Proc Natl Acad Sci U S A 93, 13577-82, Fairall, L., Schwabe, J.W. R., Chapman, L., Finch, J. T. & Rhodes, D. (1993) Nature (London)366, 483-7, Wuttke, D. S., Foster, M. P., Case, D. A., Gottesfeld, J. M.& Wright, P. E. (1997) J. Mol. Biol. 273, 183-206., Nolte, R. T.,Conlin, R. M., Harrison, S. C. & Brown, R. S. (1998) Proc. Natl. Acad.Sci. U. S. A. 95, 2938-2943, Narayan, V. A., Kriwacki, R. W. &Caradonna, J. P. (1997) J. Biol. Chem. 272, 7801-7809) have shown thatresidues from primarily three positions on the α-helix, −1, 3, and 6,are involved in specific base contacts. Typically, the residue atposition −1 of the α-helix contacts the 3′ base of that finger's subsitewhile positions 3 and 6 contact the middle base and the 5′ base,respectively.

In order to select a family of zinc finger domains recognizing the5′-GNN-3′ subset of sequences, two highly diverse zinc finger librarieswere constructed in the phage display vector pComb3H (Barbas III, C. F.,Kang, A. S., Lerner, R. A. & Benkovic, S. J. (1991) Proc. Natl. Acad.Sci. USA 88, 7978-7982, Rader, C. & Barbas III, C. F. (1997) Curr. Opin.Biotechnol. 8, 503-508). Both libraries involved randomization ofresidues within the .alpha.-helix of finger 2 of C7, a variant of Zif268(Wu, H., Yang, W.-P. & Barbas III, C. F. (1995) PNAS 92, 344-348).Library 1 was constructed by randomization of positions −1,1,2,3,5,6using a NNK doping strategy while library 2 was constructed using a VNSdoping strategy with randomization of positions −2,−1,1,2,3,5,6. The NNKdoping strategy allows for all amino acid combinations within 32 codonswhile VNS precludes Tyr, Phe, Cys and all stop codons in its 24 codonset. The libraries consisted of 4.4×10⁹ and 3.5×10⁹ members,respectively, each capable of recognizing sequences of the5′-GCGNNNGCG-3′ type. The size of the NNK library ensured that it couldbe surveyed with 99% confidence while the VNS library was highly diversebut somewhat incomplete. These libraries are, however, significantlylarger than previously reported zinc finger libraries (Choo, Y. & Klug,A. (1994) Proc Natl Acad Sci U S A 91, 11163-7, Greisman, H. A. & Pabo,C. O. (1997) Science (Washington, D.C.) 275, 657-661, Rebar, E. J. &Pabo, C. O. (1994) Science (Washington, D.C., 1883-) 263, 671-3,Jamieson, A. C., Kim, S.-H. & Wells, J. A. (1994) Biochemistry 33,5689-5695, Jamieson, A. C., Wang, H. & Kim, S.-H. (1996) PNAS 93,12834-12839, Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37,12026-33). Seven rounds of selection were performed on the zinc fingerdisplaying-phage with each of the 16 5′-GCGGNNGCG-3′ biotinylatedhairpin DNAs targets using a solution binding protocol. Stringency wasincreased in each round by the addition of competitor DNA. Shearedherring sperm DNA was provided for selection against phage that boundnon-specifically to DNA. Stringent selective pressure for sequencespecificity was obtained by providing DNAs of the 5′-GCGNNNGCG-3′ typesas specific competitors. Excess DNA of the 5′-GCGGNNGCG-3′ type wasadded to provide even more stringent selection against binding to DNAswith single or double base changes as compared to the biotinylatedtarget. Phage binding to the single biotinylated DNA target sequencewere recovered using streptavidin coated beads. In some cases theselection process was repeated. The present data show that these domainsare functionally modular and can be recombined with one another tocreate polydactyl proteins capable of binding 1 8-bp sequences withsubnanomolar affinity. The family of zinc finger domains describedherein is sufficient for the construction of 17 million novel proteinsthat bind the 5′-(GNN)₆-3′ family of DNA sequences.

The invention includes a nucleotide sequence encoding a zincfinger-nucleotide binding polypeptide. DNA sequences encoding the zincfinger-nucleotide binding polypeptides of the invention, includingnative, truncated, and expanded polypeptides, can be obtained by severalmethods. For example, the DNA can be isolated using hybridizationprocedures which are well known in the art. These include, but are notlimited to: (1) hybridization of probes to genomic or cDNA libraries todetect shared nucleotide sequences; (2) antibody screening of expressionlibraries to detect shared structural features; and (3) synthesis by thepolymerase chain reaction (PCR). RNA sequences of the invention can beobtained by methods known in the art (See for example, Current Protocolsin Molecular Biology Ausubel, et al. Eds., 1989).

The development of specific DNA sequences encoding zincfinger-nucleotide binding polypeptides of the invention can be obtainedby: (1) isolation of a double-stranded DNA sequence from the genomicDNA; (2) chemical manufacture of a DNA sequence to provide the necessarycodons for the polypeptide of interest; and (3) in vitro synthesis of adouble-stranded DNA sequence by reverse transcription of mRNA isolatedfrom a eukaryotic donor cell. In the latter case, a double-stranded DNAcomplement of mRNA is eventually formed which is generally referred toas cDNA. Of these three methods for developing specific DNA sequencesfor use in recombinant procedures, the isolation of genomic DNA is theleast common. This is especially true when it is desirable to obtain themicrobial expression of mammalian polypeptides due to the presence ofintrons.

For obtaining zinc finger derived-DNA binding polypeptides, thesynthesis of DNA sequences is frequently the method of choice when theentire sequence of amino acid residues of the desired polypeptideproduct is known. When the entire sequence of amino acid residues of thedesired polypeptide is not known, the direct synthesis of DNA sequencesis not possible and the method of choice is the formation of cDNAsequences. Among the standard procedures for isolating cDNA sequences ofinterest is the formation of plasmid-carrying cDNA libraries which arederived from reverse transcription of mRNA which is abundant in donorcells that have a high level of genetic expression. When used incombination with polymerase chain reaction technology, even rareexpression products can be clones. In those cases where significantportions of the amino acid sequence of the polypeptide are known, theproduction of labeled single or double-stranded DNA or RNA probesequences duplicating a sequence putatively present in the target cDNAmay be employed in DNA/DNA hybridization procedures which are carriedout on cloned copies of the cDNA which have been denatured into asingle-stranded form (Jay, et al., Nucleic Acid Research 11:2325, 1983).

In another aspect, the present invention provides a pharmaceuticalcomposition comprising a therapeutically effective amount of a zincfinger-nucleotide binding polypeptide or a therapeutically effectiveamount of a nucleotide sequence that encodes a zinc finger-nucleotidebinding polypeptide in combination with a pharmaceutically acceptablecarrier.

As used herein, the terms “pharmaceutically acceptable”,“physiologically tolerable” and grammatical variations thereof, as theyrefer to compositions, carriers, diluents and reagents, are usedinterchangeable and represent that the materials are capable ofadministration to or upon a human without the production of undesirablephysiological effects such as nausea, dizziness, gastric upset and thelike which would be to a degree that would prohibit administration ofthe composition.

The preparation of a pharmacological composition that contains activeingredients dissolved or dispersed therein is well understood in theart. Typically such compositions are prepared as sterile injectableseither as liquid solutions or suspensions, aqueous or non-aqueous,however, solid forms suitable for solution, or suspensions, in liquidprior to use can also be prepared. The preparation can also beemulsified.

The active ingredient can be mixed with excipients which arepharmaceutically acceptable and compatible with the active ingredientand in amounts suitable for use in the therapeutic methods describedherein. Suitable excipients are, for example, water, saline, dextrose,glycerol, ethanol or the like and combinations thereof. In addition, ifdesired, the composition can contain minor amounts of auxiliarysubstances such as wetting or emulsifying agents, as well as pHbuffering agents and the like which enhance the effectiveness of theactive ingredient.

The therapeutic pharmaceutical composition of the present invention caninclude pharmaceutically acceptable salts of the components therein.Pharmaceutically acceptable salts include the acid addition salts(formed with the free amino groups of the polypeptide) that are formedwith inorganic acids such as, for example, hydrochloric or phosphoricacids, or such organic acids as acetic, tartaric, mandelic and the like.Salts formed with the free carboxyl groups can also be derived frominorganic bases such as, for example, sodium, potassium, ammonium,calcium or ferric hydroxides, and such organic bases as isopropylamine,trimethylamine, 2-ethylamino ethanol, histidine, procaine and the like.

Physiologically tolerable carriers are well known in the art. Exemplaryof liquid carriers are sterile aqueous solutions that contain nomaterials in addition to the active ingredients and water, or contain abuffer such as sodium phosphate at physiological pH value, physiologicalsaline or both, such as phosphate-buffered saline. Still further,aqueous carriers can contain more than one buffer salt, as well as saltssuch as sodium and potassium chlorides, dextrose, propylene glycol,polyethylene glycol and other solutes. Liquid compositions can alsocontain liquid phases in addition to and to the exclusion of water.Exemplary of such additional liquid phases are glycerin, vegetable oilssuch as cottonseed oil, organic esters such as ethyl oleate, andwater-oil emulsions.

III. Compositions

In another aspect, the present invention provides a plurality of zincfinger-nucleotide binding polypeptides operatively linked in such amanner to specifically bind a nucleotide target motif defined as5′-(GNN)_(n)-3′, where n is an integer greater than 1. Preferably, n isan integer from 2 to about 6.

Means for linking zinc finger-nucleotide binding polypeptide aredescribed hereinafter in the Examples as well as in U.S. patentapplication Ser. No. 08/676,318 (filed Jan. 18, 1995). The individualpolypeptides are preferably linked with oligopeptide linkers. Suchlinkers preferably resemble the linkers that are found in naturallyoccurring zinc finger proteins. A preferred linker for use in thepresent invention is the amino acid residue sequence TGEKP (SEQ IDNO:111).

To examine the efficacy of making such compositions and their use ingene control, the human erbB-2 and erbB-3 genes were chosen as a model.A polydactyl protein specifically recognizing an 18-bp sequence in the5′-untranslated region of this gene was converted into a transcriptionalrepressor by fusion with KRAB, ERD, or SID repressor domains.Transcriptional activators were generated by fusion with the herpessimplex VP16 activation domain or with a tetrameric repeat of VP16'sminimal activation domain, termed VP64. The data show for the first timethat both gene repression and activation can be achieved by targetingdesigned proteins to a single site within the transcribed region of agene.

The human erbB-2 and erbB-3 genes were chosen as model targets for thedevelopment of zinc finger-based transcriptional switches. Members ofthe ErbB receptor family play important roles in the development ofhuman malignancies. In particular, erbB-2 is overexpressed as a resultof gene amplification and/or transcriptional deregulation in a highpercentage of human adenocarcinomas arising at numerous sites, includingbreast, ovary, lung, stomach, and salivary gland (Hynes, N. E. & Stern,D. F. (1994) Biochim. Biophys. Acta 1198, 165-184). Increased expressionof ErbB-2 leads to constitutive activation of its intrinsic tyrosinekinase, and has been shown to cause the transformation of culturedcells. Numerous clinical studies have shown that patients bearing tumorswith elevated ErbB-2 expression levels have a poorer prognosis (Hynes,N. E. & Stern, D. F. (1994) Biochim. Biophys. Acta 1198, 165-184). Inaddition to its involvement in human cancer, erbB-2 plays importantbiological roles, both in the adult and during embryonal development ofmammals (Hynes, N. E. & Stem, D. F. (1994) Biochim. Biophys. Acta 1198,165-184, Altiok, N., Bessereau, J.-L. & Changeux, J.-P. (1995) EMBO J.14, 42584266, Lee, K.-F., Simon, H., Chen, H., Bates, B., Hung, M.-C. &Hauser, C. (1995) Nature 378, 394-398).

The erbB-2 promoter therefore represents an interesting test case forthe development of artificial transcriptional regulators. This promoterhas been characterized in detail and has been shown to be relativelycomplex, containing both a TATA-dependent and a TATA-independenttranscriptional initiation site (Ishii, S., Imamoto, F., Yamanashi, Y.,Toyoshima, K. & Yamamoto, T. (1987) Proc. Natl. Acad. Sci. USA 84,4374-4378). Whereas early studies showed that polydactyl proteins couldact as transcriptional regulators that specifically activate or represstranscription, these proteins bound upstream of an artificial promoterto six tandem repeats of the proteins binding site (Liu, Q., Segal, D.J., Ghiara, J. B. & Barbas III, C. F. (1997) Proc. Natl. Acad. Sci. USA94, 5525-5530). Furthermore, this study utilized polydactyl proteinsthat were not modified in their binding specificity. Herein, we testedthe efficacy of polydactyl proteins assembled from predefined buildingblocks to bind a single site in the native erbB-2 promoter. Describedabove is the generation and characterization of a family of zinc fingerdomains that bind each of the 16 5′-GNN-3′ DNA triplets. One reason wefocused on the production of this family of recognition domains is thatpromoter regions of most organisms are relatively GC rich in their basecontent. Thus, if proteins recognizing 5′-(GNN)_(x)-3′ sites could bereadily assembled from this set of defined zinc finger domains, manygenes could be rapidly and specifically targeted for regulation. Aprotein containing six zinc finger domains and recognizing 18 bp of DNAshould be sufficient to define a single address within all knowngenomes. Examination of the erbB-2 promoter region revealed two5′-(GNN)₆-3′ sites and one 5′-(GNN)₉-3′ site. One of these sites,identified here as e2c, falls within the 5′-untranslated region of theerbB-2 gene and was chosen as the target site for the generation of agene-specific transcriptional switch. A BLAST sequence similarity searchof the GenBank data base confirmed that this sequence is unique toerbB-2. The position of the e2c target sequence, downstream and in thevicinity of the two major transcription initiation sites, allowed forthe examination of repression through inhibition of either transcriptioninitiation or elongation. An interesting feature of the e2c target siteis that it is found within a short stretch of sequence that is conservedbetween human, rat, and mouse erbB-2 genes (White, M. R.-A. & Hung,M.-C. (1992) Oncogene 7, 677-683). Thus, targeting of this site wouldallow for the study of this strategy in animal models prior to itsapplication to human disease. For generating polydactyl proteins withdesired DNA-binding specificity, the present studies have focused on theassembly of predefined zinc finger domains, which contrasts thesequential selection strategy proposed by Greisman and Pabo (Greisman,H. A. & Pabo, C. O. (1997) Science 275, 657-661). Such a strategy wouldrequire the sequential generation and selection of six zinc fingerlibraries for each required protein, making this experimental approachinaccessible to most laboratories and extremely time consuming to all.Further, since it is difficult to apply specific negative selectionagainst binding alternative sequences in this strategy, proteins mayresult that are relatively unspecific as was recently reported (Kim,J.-S. & Pabo, C. O. (1997) J. Biol. Chem. 272, 29795-29800). The generalutility of two different strategies for generating three-finger proteinsrecognizing 9 bp of DNA sequence was investigated. Each strategy wasbased on the modular nature of the zinc finger domain, and takesadvantage of a family of zinc finger domains recognizing triplets of the5′-GNN-3′. Two three-finger proteins recognizing halfsites (HS) 1 and 2of the 5′-(GNN)₆-3′ erbB-2 target site e2c were generated in the firststrategy by fusing the pre-defined finger 2 (F2) domain variantstogether using a PCR assembly strategy. To examine the generality ofthis approach, three additional three-finger proteins recognizingsequences of the 5′-(GNN)₃-3′ type, were prepared using the sameapproach. Purified zinc finger proteins were prepared as fusions withthe maltose binding protein (MBP). ELISA analysis revealed that seriallyconnected F2 proteins were able to act in concert to specificallyrecognize the desired 9-bp DNA target sequences. Each of the 5 proteinsshown was able to discriminate between target and non-target5′-(GNN).sub.3-3′ sequence.

The affinity of each of the proteins for its target was determined byelectrophoretic mobility-shift assays. These studies demonstrated thatthe zinc finger peptides have affinities comparable to Zif268 and othernatural transcription factors with K_(d) values that ranged from 3 to 70nM. Here the K_(d) of Zif268 for its operator to be 10 nM. It must benoted that, for reasons that remain to be explained, one group hasreported K_(d) values for the natural Zif268 protein that range from 6nM to 10 pM, a 600-fold variation (Pavletich, N. P. & Pabo, C. O. (1991)Science 252, 809-17., Greisman, H. A. & Pabo, C. O. (1997) Science 275,657-661). Most studies have reported the K_(d) of the Zif268-DNAinteraction to be from 3 to 10 nM, Choo, Y. & Klug, A. (1994) Proc.Natl. Acad. Sci. USA 91, 11163-11167, Hamilton, T. B., Borel, F. &Romaniuk, P. J. (1998) Biochemistry 37, 2051-2058). Thus, in order tocompare the results reported here with those reported elsewhere, therelative K_(d)s should be compared, (Mutant K_(d))/(Zif268 K_(d)), whereboth values are derived from the same report. The present data comparefavorably to other studies of novel three-finger proteins prepared usingphage display where affinities 10- to 200-fold weaker than Zif268 werereported (Greisman, H. A. & Pabo, C. O. (1997) Science 275, 657-661,Choo, Y., Sanchez-Garcia, I. & Klug, A. (1994) Nature 372, 642-5).

As an alternative to the serial connection of F2 domain variants, in thesecond strategy, three-finger proteins specific for the two e2c5′-(GNN)₃-3′ halfsites were produced by “helix grafting”. The frameworkresidues of the zinc finger domains, those residues that support thepresentation of the recognition helix, vary between proteins. Weanticipated that the framework residues may play a role in affinity andspecificity. For helix grafting, amino acid positions −2 to 6 of the DNArecognition helices were either grafted into a Zif268 (Pavletich, N. P.& Pabo, C. O. (1991) Science 252, 809-17) or an Sp1C framework(Desjarlais, J. R. & Berg, J. M. (1993) Proc. Natl. Acad. Sci. USA 90,2256-60). The Sp1C protein is a designed consensus protein shown to haveenhanced stability towards chelating agents. The proteins were expressedfrom DNA templates prepared by a rapid PCR-based gene assembly strategy.In each case, ELISA analysis of MBP fusion proteins showed that the DNAbinding specificities and affinities observed with the F2 frameworkconstructs were retained.

As discussed above, the recognition of 9 bp of DNA sequence is notsufficient to specify a unique site within a complex genome. Incontrast, a six-finger protein recognizing 18 bp of contiguous DNAsequence could define a single site in the human genome, thus fulfillingan important prerequisite for the generation of a gene-specifictranscriptional switch. Six-finger proteins binding the erbB-2 targetsequence e2c were generated from three-finger constructs by simplerestriction enzyme digestion and cloning with F2, Zif268, and Sp1Cframework template DNAs. ELISA analysis of purified MBP fusion proteinsshowed that each of the six-finger proteins was able to recognize thespecific target sequence, with little cross reactivity to non-target5′-(GNN)₆-3′ sites or a tandem repeat of the Zif268 target site.

The affinity of each protein for the e2c DNA target site was determinedby gel-shift analysis. A modest K_(d) value of 25 nM was observed withthe E2C(F2) six-finger protein constructed from the F2 framework, avalue that is only 2 to 3 times better than its constituent three-fingerproteins. In our previous studies of six-finger proteins, we observedapproximately 70-fold enhanced affinity of the six-finger proteins fortheir DNA ligand as compared to their three-finger constituents (Liu,Q., Segal, D. J., Ghiara, J. B. & Barbas III, C. F. (1997) Proc. Natl.Acad. Sci. USA 94, 5525-5530). The absence of a substantial increase inthe affinity of the E2C(F2) peptide suggested that serial connection ofF2 domains is not optimal. It is possible that the periodicity of the F2domains of the six-finger protein does not match that of the DNA overthis extended sequence, and that a significant fraction of the bindingenergy of this protein is spent in unwinding DNA (Shi, Y. & Berg, J. M.(1996) Biochemistry 35, 3845-8). In contrast to the F2 domain protein,the E2C(Zif) and E2C(Sp1) six-finger proteins displayed 40- to 70-foldincreased affinity as compared to their original three-finger proteinconstituents, with K_(d) values of 1.6 nM and 0.5 nM, respectively.Significantly, both three-finger components of these proteins wereinvolved in binding, since mutation of either half-site led to a roughly100-fold decrease in affinity. The preponderance of known transcriptionfactors bind their specific DNA ligands with nanomolar affinity,suggesting that the control of gene expression is governed byprotein/DNA complexes of unexceptional life times. Thus, zinc fingerproteins of increased affinity should not be required and could bedisadvantageous, especially if binding to non-specific DNA is alsoincreased.

The zinc finger domain is generally considered to be modular in nature,with each finger recognizing a 3-bp subsite (Pavletich, N. P. & Pabo, C.O. (1991) Science 252, 809-17). This is supported by our ability torecombine zinc finger domains in any desired sequence, yieldingpolydactyl proteins recognizing extended sequences of the structure5′-(GNN)_(x)-3′. However, it should be noted that at least in somecases, zinc finger domains appear to specify overlapping 4 bp sitesrather than individual 3 bp sites. In Zif268, residues in addition tothose found at helix positions −1, 3, and 6 are involved in contactingDNA (Elrod-Erickson, M., Rould, M. A., Nekludova, L. & Pabo, C. O.(1996) Structure 4, 1171-1180). Specifically, an aspartate in helixposition 2 of F2 plays several roles in recognition and makes a varietyof contacts. The carboxylate of the aspartate side chain hydrogen bondswith arginine at position −1, stabilizing its interaction with the3′-guanine of its target site. This aspartate also participates inwater-mediated contacts with the guanine's complementary cytosine. Inaddition, this carboxylate is observed to make a direct contact to theN4 of the cytosine base on the opposite strand of the 5′-guanine base ofthe finger 1 binding site. It is this interaction which is the chemicalbasis for target site overlap. Indeed, when the Zif268 F2 libraries wereselected against the four 5′-GCG GNG GCG-3′ sequences, both an arginineat position-1 and an aspartate at position 2 were obtained, analogous tothe residues in native Zif268. Since the e2c target sequence (5′-GGG GCCGGA GCC GCA GTG-3′) (SEQ ID NO: 112) is followed by an A rather than aG, a potential target site overlap problem was anticipated with finger 1of an e2c-specific six-finger protein. However, in both the Zif- andSp1C-framework six-finger proteins, the GTG-specific finger 1 containingan aspartate at position 2 appears to recognize the sequences 5′-GTGA-3′and 5′-GTGG-3′ equally well, as indicated by their very similaraffinities to target sites e2c-a and e2c-g.

A polynucleotide or composition of this invention as set forth above,can be operatively linked to one or more transcription modulatingfactors. Modulating factors such as transcription activators ortranscription suppressors or repressors are well known in the art. Meansfor operatively linking polypeptides to such factors are also well knownin the art. Exemplary and preferred such factors and their use tomodulate gene expression are discussed in detail hereinafter.

II Uses

In one embodiment, a method of the invention includes a process formodulating (inhibiting or suppressing) the function of a nucleotidesequence comprising a zinc finger-nucleotide binding motif whichcomprises contacting the zinc finger-nucleotide binding motif with aneffective amount of a zinc finger-nucleotide binding polypeptide thatbinds to the motif. In the case where the nucleotide sequence is apromoter, the method includes inhibiting the transcriptionaltransactivation of a promoter containing a zinc finger-DNA bindingmotif. The term “inhibiting” refers to the suppression of the level ofactivation of transcription of a structural gene operably linked to apromoter, containing a zinc finger-nucleotide binding motif, forexample. In addition, the zinc finger-nucleotide binding polypeptidederivative may bind a motif within a structural gene or within an RNAsequence.

The term “effective amount” includes that amount which results in thedeactivation of a previously activated promoter or that amount whichresults in the inactivation of a promoter containing a zincfinger-nucleotide binding motif, or that amount which blockstranscription of a structural gene or translation of RNA. The amount ofzinc finger derived-nucleotide binding polypeptide required is thatamount necessary to either displace a native zinc finger-nucleotidebinding protein in an existing protein/promoter complex, or that amountnecessary to compete with the native zinc finger-nucleotide bindingprotein to form a complex with the promoter itself. Similarly, theamount required to block a structural gene or RNA is that amount whichbinds to and blocks RNA polymerase from reading through on the gene orthat amount which inhibits translation, respectively. Preferably, themethod is performed intracellularly. By functionally inactivating apromoter or structural gene, transcription or translation is suppressed.Delivery of an effective amount of the inhibitory protein for binding toor “contacting” the cellular nucleotide sequence containing the zincfinger-nucleotide binding protein motif, can be accomplished by one ofthe mechanisms described herein, such as by retroviral vectors orliposomes, or other methods well known in the art.

The term “modulating” refers to the suppression, enhancement orinduction of a function. For example, the zinc finger-nucleotide bindingpolypeptide of the invention may modulate a promoter sequence by bindingto a motif within the promoter, thereby enhancing or suppressingtranscription of a gene operatively linked to the promoter nucleotidesequence. Alternatively, modulation may include inhibition oftranscription of a gene where the zinc finger-nucleotide bindingpolypeptide binds to the structural gene and blocks DNA dependent RNApolymerase from reading through the gene, thus inhibiting transcriptionof the gene. The structural gene may be a normal cellular gene or anoncogene, for example. Alternatively, modulation may include inhibitionof translation of a transcript.

The promoter region of a gene includes the regulatory elements thattypically lie 5′ to a structural gene. If a gene is to be activated,proteins known as transcription factors attach to the promoter region ofthe gene. This assembly resembles an “on switch” by enabling an enzymeto transcribe a second genetic segment from DNA to RNA. In most casesthe resulting RNA molecule serves as a template for synthesis of aspecific protein; sometimes RNA itself is the final product.

The promoter region may be a normal cellular promoter or, for example,an onco-promoter. An onco-promoter is generally a virus-derivedpromoter. For example, the long terminal repeat (LTR) of retroviruses isa promoter region which may be a target for a zinc finger bindingpolypeptide variant of the invention. Promoters from members of theLentivirusgroup, which include such pathogens as human T-celllymphotrophic virus (HTLV) 1 and 2, or human immunodeficiency virus(HIV) 1 or 2, are examples of viral promoter regions which may betargeted for transcriptional modulation by a zinc finger bindingpolypeptide of the invention.

In order to test the concept of using zinc finger proteins asgene-specific transcriptional regulators, the E2C(Sp1) six-fingerprotein was fused to a number of effector domains. Transcriptionalrepressors were generated by attaching either of three human-derivedrepressor domains to the zinc finger protein. The first repressorprotein was prepared using the ERF repressor domain (ERD) (Sgouras, D.N., Athanasiou, M. A., Beal, G. J., Jr., Fisher, R. J., Blair,. D. G. &Mavrothalassitis, G. J. (1995) EMBO J. 14, 47814793), defined by aminoacids 473 to 530 of the ets2 repressor factor (ERF). This domainmediates the antagonistic effect of ERF on the activity of transcriptionfactors of the ets family. A synthetic repressor was constructed byfusion of this domain to the C-terminus of the zinc finger protein. Thesecond repressor protein was prepared using the Kruppel-associated box(KRAB) domain (Margolin, J. F., Friedman, J. R., Meyer, W., K.-H.,Vissing, H., Thiesen, H.-J. & Rauscher III, F. J. (1994) Proc. Natl.Acad. Sci. USA 91, 45094513). This repressor domain is commonly found atthe N-terminus of zinc finger proteins and presumably exerts itsrepressive activity on TATA-dependent transcription in a distance- andorientation-independent manner (Pengue, G. & Lania, L. (1996) Proc.Natl. Acad. Sci. USA 93, 1015-1020), by interacting with the RING fingerprotein KAP-1 (Friedman, J. R., Fredericks, W. J., Jensen, D. E.,Speicher, D. W., Huang, X.-P., Neilson, E. G. & Rauscher III, F. J.(1996) Genes & Dev. 10, 2067-2078). We utilized the KRAB domain foundbetween amino acids 1 and 97 of the zinc finger protein KOX1 (Margolin,J. F., Friedman, J. R., Meyer, W., K.-H., Vissing, H., Thiesen, H.-J. &Rauscher III, F. J. (1994) Proc. Natl. Acad. Sci. USA 91, 4509-4513). Inthis case an N-terminal fusion with the six-finger protein wasconstructed. Finally, to explore the utility of histone deacetylationfor repression, amino acids 1 to 36 of the Mad mSIN3 interaction domain(SID) were fused to the N-terminus of the zinc finger protein (Ayer, D.E., Laherty, C. D., Lawrence, Q. A., Armstrong, A. P. & Eisenman, R. N.(1996) Mol. Cell. Biol. 16, 5772-5781). This small domain is found atthe N-terminus of the transcription factor Mad and is responsible formediating its transcriptional repression by interacting with mSIN3,which in turn interacts the co-repressor N-CoR and with the histonedeacetylase mRPD1 (Heinzel, T., Lavinsky, R. M., Mullen, T.-M.,Söderström, M., Laherty, C. D., Torchia, J., Yang, W.-M., Brard, G.,Ngo, S. D. & al., e. (1997) Nature 387, 43-46). To examine gene-specificactivation, transcriptional activators were generated by fusing the zincfinger protein to amino acids 413 to 489 of the herpes simplex virusVP16 protein (Sadowski, I., Ma, J., Triezenberg, S. & Ptashne, M. (1988)Nature 335, 563-564), or to an artificial tetrameric repeat of VP16'sminimal activation domain, DALDDFDLDML (SEQ ID NO:1 13) (Seipel, K.,Georgiev, O. & Schaffner, W. (1992) EMBO J. 11, 4961-4968), termed VP64.

Reporter constructs containing fragments of the erbB-2 promoter coupledto a luciferase reporter gene were generated to test the specificactivities of our designed transcriptional regulators. The targetreporter plasmid contained nucleotides-758 to -1 with respect to the ATGinitiation codon, whereas the control reporter plasmid containednucleotides-1571 to -24, thus lacking all but one nucleotide of the E2Cbinding site encompassed in positions-24 to -7. Both promoter fragmentsdisplayed similar activities when transfected transiently into HeLacells, in agreement with previous observations (Hudson, L. G., Ertl, A.P. & Gill, G. N. (1990) J. Biol. Chem. 265, 4389-4393). To test theeffect of zinc finger-repressor domain fusion constructs on erbB-2promoter activity, HeLa cells were transiently co-transfected with eachof the zinc finger expression vectors and the luciferase reporterconstructs. Significant repression was observed with each construct. TheERD and SID fusion proteins produced approximately 50% and 80%repression, respectively. The most potent repressor was the KRAB fusionprotein. This protein caused complete repression of erbB-2 promoteractivity. The observed residual activity was at the background level ofthe promoter-less pGL3 reporter. In contrast, none of the proteinscaused significant repression of the control erbB-2 reporter constructlacking the E2C target site, demonstrating that repression is indeedmediated by specific binding of the E2C(Sp1) protein to its target site.Expression of a zinc finger protein lacking any effector domain resultedin weak repression, approximately 30%, indicating that most of therepression observed with the SID and KRAB constructs is caused by theireffector domains, rather than by DNA-binding alone. This observationstrongly suggests that the mechanism of repression is active inhibitionof transcription initiation rather than of elongation. Once initiationof transcription by RNA polymerase II has occurred, the zinc fingerprotein appears to be readily displaced from the DNA by the action ofthe polymerase.

The utility of gene-specific polydactyl proteins to mediate activationof transcription was investigated using the same two reporterconstructs. The VP16 fusion protein was found to stimulate transcriptionapproximately 5-fold, whereas the VP64 fusion protein produced a 27-foldactivation. This dramatic stimulation of promoter activity caused by asingle VP16-based transcriptional activator is exceptional in view ofthe fact that the zinc finger protein binds in the transcribed region ofthe gene. This again demonstrates that mere binding of a zinc fingerprotein, even with one with sub-nanomolar affinity, in the path of RNApolymerase 11 need not necessarily negatively affect gene expression.

The data herein show that zinc finger proteins capable of binding novel9- and 18-bp DNA target sites can be rapidly prepared using pre-defineddomains recognizing 5′-GNN-3′ sites. This information is sufficient forthe preparation of 16⁶ or 17 million novel six-finger proteins eachcapable of binding 18 bp of DNA sequence. This rapid methodology for theconstruction of novel zinc finger proteins has advantages over thesequential generation and selection of zinc finger domains proposed byothers (Greisman, H. A. & Pabo, C. O. (1997) Science 275, 657-661) andtakes advantage of structural information that suggests that thepotential for the target overlap problem as defined above might beavoided in proteins targeting 5′-GNN-3′ sites. Using the complex andwell studied erbB-2 promoter and live human cells, the data demonstratethat these proteins, when provided with the appropriate effector domain,can be used to provoke or activate expression and to produce gradedlevels of repression down to the level of the background in theseexperiments. These studies suggest that the KRAB domain is significantlymore potent as a transcriptional repressor than ERD or SID domains, andthat it is able to inhibit both the TATA-dependent and theTATA-independent transcriptional initiation of this promoter. Theserepressor domains have not previously been directly compared. Thepresent strategy of using predefined zinc finger domains to constructpolydactyl proteins coupled to effector domains has significantadvantages over strategies that attempt to only repress transcription bycompeting or interfering with proteins involved in the transcriptioncomplex (Kim, J.-S. & Pabo, C. O. (1997) J. Biol. Chem. 272,29795-29800, Kim, J.-S., Kim, J., Cepek, K. L., Sharp, P. A. & Pabo, C.O. (1997) Proc. Natl. Acad. Sci. USA 94, 3616-3620). Utilization ofeffector domains that have the potential to act over a distance shouldallow the application of these gene-switches to the regulation ofuncharacterized genes and promoters. Since these transcriptionalregulators might be prepared using our PCR-assembly strategy in ahigh-throughput fashion, we believe it is appropriate to comment ontheir potential practical applications. Novel DNA binding proteinsgenerated in this manner should have potential utility in DNA-baseddiagnostic applications. For the study of gene function, we believe thatthe ability to both activate and repress the transcription of genes, atgraded levels if necessary, may assist in assigning gene function. Sincethese proteins exert their control by acting in trans, functional geneknockout or activation might be produced in heterozygous transgenicanimals. This would drastically reduce the time required to produce agene knockout in a whole animal and would extend the range of organismsto which knockout technology might be applied. These proteins might alsobe used in gene therapy applications to inhibit the production of viralgene products or to activate genes involved in fighting disease.Significantly, the ease with which these proteins can be prepared willfacilitate the testing of these ideas by the scientific community.

The Examples that follow illustrate preferred embodiments of the presentinvention and are not limiting of the specification or claims in anyway.

Example 1 Selection by Phage Display

Construction of zinc-finger libraries by PCR overlap extension wasessentially as previously described (Shi, Y. & Berg, J. M. (1996)Biochemistry 35, 3845-8). Growth and precipitation of phage were aspreviously described (Pengue, G. & Lania, L. (1996) Proc. Natl. Acad.Sci. USA 93, 1015-1020, Friedman, J. R., Fredericks, W. J., Jensen, D.E., Speicher, D. W., Huang, X.-P., Neilson, E. G. & Rauscher III, F. J.(1996) Genes & Dev. 10, 2067-2078), except that ER2537 cells (NewEngland Biolabs) were used to propagate the phage and 90 μM ZnCl₂ wasadded to the growth media. Precipitated phage were resuspended in ZincBuffer A (ZBA; 10 mM Tris, pH7.5/90 mM KCl, 1 mM MgCl₂, 90 μM ZnCl₂)/1%BSA/5 mM DTT. Binding reactions (500 μl: ZBA/5 mM DTT/1% Blotto(BioRad)/competitor oligonucleotides/4 μg sheared herring sperm DNA(Sigma)/100 μl filtered phage (≈10¹³ colony forming units)) wereincubated for 30 minutes at room temperature, prior to the addition of72 nM biotinylated hairpin target oligonucleotide. Incubation continuedfor 3.5 hours with constant gentle mixing. Streptavidin-coated magneticbeads (50 μp; Dynal) were washed twice with 500 .mu.l ZBA/1% BSA, thenblocked with 500 .mu.l ZBA/5% Blotto/antibody-displaying (irrelevant)phage (≈10¹² colony forming units) for ≈4 hours at room temperature. Atthe end of the binding period, the blocking solution was replaced by thebinding reaction and incubated 1 hour at room temperature. The beadswere washed 10 times over a 1 hour period with 500 μl ZBA/5 mM DTT/2%Tween 20, then once without Tween 20. Bound phage were eluted 30 minuteswith 10 .μg/μl trypsin.

Hairpin target oligonucleotides had the sequence5′-Biotin-GGACGCN′N′N′CGCGGGTTTTCCCGCGNNNGCGTCC-3′ (SEQ ID NO:114),where NNN was the 3-nucleotide finger 2-target sequence and N′N′N′ itscomplement. A similar nonbiotinylated oligonucleotide, in which thetarget sequence was TGG (compTGG), was included at 7.2 nM in every roundof selection to select against contaiminating parental phage. Two poolsof nonbiotinylated oligonucleotides were also used as competitors: onecontaining all 64 possible 3-nucleotide targets sequences (compNNN), theother containing all the GNN target sequences except for the currentselection target (compGNN). These pools were typically used as follows:round 1, no compNNN or compGNN; round 2, 7.2 nM compGNN; round 3, 10.8nM compGNN; round 4, 1.8 μM compNNN, 25 nM compGNN; round 5, 2.7 μMcompNNN, 90 nM compGNN; round 6, 2.7 μM compNNN, 250 nM compGNN; round7, 3.6 μM compNNN, 250 nM compGNN.

Example 2 Multi-Target Specificity Assays

The fragment of pComb3H (Pengue, G. & Lania, L. (1996) Proc. Natl. Acad.Sci. USA 93, 1015-1020, Heinzel, T., Lavinsky, R. M., Mullen, T.-M.,Söderström, M., Laherty, C. D., Torchia, J., Yang, W.-M., Brard, G.,Ngo, S. D. & al., e. (1997) Nature 387, 43-46) phagemid RF DNAcontaining the zinc-finger coding sequence was subcloned into a modifiedpMAL-c2 (New England Biolabs) bacterial expression vector andtransformed into XL1-Blue (Stratagene). Freeze/thaw extracts containingthe overexpressed maltose binding protein-zinc finger fusion proteinswere prepared from IPTG-induced cultures using the Protein Fusion andPurification System (New England Biolabs). In 96-well ELISA plates, 0.2μg of streptavidin (Pierce) was applied to each well for 1 hour at 37°C., then washed twice with water. Biotinylated target oligonucleotide(0.025 μg) was applied similarly. ZBA/3% BSA was applied for blocking,but the wells were not washed after incubation. All subsequentincubations were at room temperature. Eight 2-fold serial dilutions ofthe extracts were applied in 1× binding buffer (ZBA/1% BSA/5 mM DTT/0.12μg/μl sheared herring spern DNA). The samples were incubated 1 hour,followed by 10 washes with water. Mouse anti-maltose binding protein mAb(Sigma) in ZBA/1% BSA was applied to the wells for 30 minutes, followedby 10 washes with water. Goat anti-mouse IgG mAb conjugated to alkalinephosphatase (Sigma) was applied to the wells for 30 minutes, followed by10 washes with water. Alkaline phosphatase substrate (Sigma) wasapplied, and the OD.sub.405 was quantitated with SOFTmax 2.35 (MolecularDevices).

Example 3 Gel Mobility Shift Assays

Fusion proteins were purified to >90% homogeneity using the ProteinFusion and Purification System (New England Biolabs), except that ZBA/5mM DTT was used as the column buffer. Protein purity and concentrationwere determined from Coomassie blue-stained 15% SDS-PAGE gels bycomparison to BSA standards. Target oligonucleotides were labeled attheir 5′ or 3′ ends with [³²p] and gel purified. Eleven 3-fold serialdilutions of protein were incubated in 20 μl binding reactions (1×Binding Buffer/10% glycerol/≈1 pM target oligonucleotide) for threehours at room temperature, then resolved on a 5% polyacrlyamide gel in0.5× TBE buffer. Quantitation of dried gels was performed using aPhosphorlmager and ImageQuant software (Molecular Dynamics), and theK_(D) was determined by Scatchard analysis.

Example 4

Generation of Polydactyl Proteins with Desired DNA Binding Specificity

The studies reported here use the finger 2 (F2) variants pmGAC, pmGAG,pGCA, pGCC, pmGGA, pmGGC, pmGGG, and pGTG defined in the accompanyingmanuscript (Hudson, L. G., Ertl, A. P: & Gill, G. N. (1990) J. Biol.Chem. 265, 4389-4393). To generate DNAs encoding three-finger proteins,F2 coding regions were PCR amplified from selected or designed F2variants and assembled by PCR overlap extension. Alternatively, DNAsencoding three-finger proteins with a Zif268 or Sp1C framework weresynthesized from 8 or 6 overlapping oligonucleotides, respectively. Sp1Cframework constructs, used for all reporter assays described in thisreport, were generated as follows. In the case of E2C-HS1(Sp1), 0.4pmole each of oligonucleotides SPE2-3 (5′-GCG AGC MG GTC GCG GCA GTC ACTMA AGA TTT GCC GCA CTC TGG GCA TTT ATA CGG TTT TTC ACC-3′) (SEQ IDNO:115) and SPE24 (5′-GTG ACT GCC GCG ACC TTG CTC GCC ATC MC GCA CTC ATACTG GCG AGA AGC CAT ACA MT GTC CAG MT GTG GC-3′) (SEQ ID NO:116) weremixed with 40 pmole each of oligonucleotides SPE2-2 (5′-GGT MG TCC TTCTCT CAG AGC TCT CAC CTG GTG CGC CAC CAG CGT ACC CAC ACG GGT GM MA CCGTAT MA TGC CCA GAG-3′) (SEQ ID NO:117) and SPE2-5 (5′-ACG CAC CAG CTTGTC AGA GCG GCT GM AGA CTT GCC ACA TTC TGG ACA TTT GTA TGG C-3′) (SEQ IDNO:118) in a standard PCR mixture and cycled 25 times (30 seconds at 94°C., 30 seconds at 60° C., 30 seconds at 72° C.). An aliquot of thispre-assembly reaction was then amplified with 40 pmole each of theprimers SPE2-1 (5′-GAG GAG GAG GAG GTG GCC CAG GCG GCC CTC GAG CCC GGGGAG MG CCC TAT GCT TGT CCG GM TGT GGT MG TCC TTC TCT CAG AGC-3′) (SEQ IDNO:119) and SPE2-6 (5′-GAG GAG GAG GAG CTG GCC GGC CTG GCC ACT AGT TTTTTT ACC GGT GTG AGT ACG TTG GTG ACG CAC CAG CTT GTC AGA GCG-3′) (SEQ IDNO:120) using the same cycling conditions. The E2C-HS2(Sp1) DNA wasgenerated in the same way, using an analogous set of oligonucleotidesdiffering only in the recognition helix coding regions. All assembledthree-finger coding regions were digested with the restrictionendonuclease Sfi1 and cloned into pMal-CSS, a derivative of thebacterial expression vector pMal-C2 (New England Biolabs). DNAs encodingsix-finger proteins with each of the different frameworks were assembledin pMal-CSS using Xma1 and BsrF1 restriction sites included in thesequences flanking the three-finger coding regions. Each of the zincfinger proteins was expressed in the E coli strain XL1-blue and bindingproperties were investigated by ELISA and gel shift analysis asdescribed in the accompanying manuscript (Hudson, L. G., Ertl, A. P. &Gill, G. N. (1990) J. Biol. Chem. 265, 4389-4393).

Example 5 Construction of Zinc Finger-Effector Domain Fusion Proteins

For the construction of zinc finger-effector domain fusion proteins,DNAs encoding amino acids 473 to 530 of the ets repressor factor (ERF)repressor domain (ERD) (Sgouras, D. N., Athanasiou, M. A., Beal, G. J.,Jr., Fisher, R. J., Blair, D. G. & Mavrothalassitis, G. J. (1995) EMBOJ. 14, 4781-4793), amino acids 1 to 97 of the KRAB domain of KOX1(Margolin, J. F., Friedman, J. R., Meyer, W., K.-H., Vissing, H.,Thiesen, H.-J. & Rauscher III, F. J. (1994) Proc. Natl. Acad. Sci. USA91, 45094513), or amino acids 1 to 36 of the Mad mSIN3 interactiondomain (SID) (Ayer, D. E., Laherty, C. D., Lawrence, Q. A., Armstrong,A. P. & Eisenman, R. N. (1996) Mol. Cell. Biol. 16, 5772-5781) wereassembled from overlapping oligonucleotides using Taq DNA polymerase.The coding region for amino acids 413 to 489 of the VP16 transcriptionalactivation domain (Sadowski, I., Ma, J., Triezenberg, S. & Ptashne, M.(1988) Nature 335, 563-564) was PCR amplified from pcDNA3/C7-C7-VP16(10). The VP64 DNA, encoding a tetrameric repeat of VP16's minimalactivation domain, comprising amino acids 437 to 447 (Seipel, K.,Georgiev, O. & Schaffner, W. (1992) EMBO J. 11, 4961-4968), wasgenerated from two pairs of complementary oligonucleotides. Theresulting fragments were fused to zinc finger coding regions by standardcloning procedures, such that each resulting construct contained aninternal SV40 nuclear localization signal, as well as a C-terminal HAdecapeptide tag. Fusion constructs were cloned in the eucaryoticexpression vector pcDNA3 (Invitrogen).

Example 6 Construction of Luciferase Reporter Plasmids

An erbB-2 promoter fragment comprising nucleotides-758 to -1, relativeto the ATG initiation codon, was PCR amplified from human bone marrowgenomic DNA with the TaqExpand DNA polymerase mix (Boehringer Mannheim)and cloned into pGL3basic (Promega), upstream of the firefly luciferasegene. A human erbB-2 promoter fragment encompassing nucleotides-1571 to-24, was excised from pSVOAL.DELTA.5′/erbB-2(N—N) (Hudson, L. G., Ertl,A. P. & Gill, G. N. (1990) J. Biol. Chem. 265, 4389-4393) by Hind3digestion and subcloned into pGL3basic, upstream of the fireflyluciferase gene.

Example 7 Luciferase Assays

For all transfections, HeLa cells were used at a confluency of 40-60%.Typically, cells were transfected with 400 ng reporter plasmid(pGL3-promoter constructs or, as negative control, pGL3basic), 50 ngeffector plasmid (zinc finger constructs in pcDNA3 or, as negativecontrol, empty pcDNA3), and 200 ng internal standard plasmid(phrAct-βGal) in a well of a 6 well dish using the lipofectamine reagent(Gibco BRL). Cell extracts were prepared approximately 48 hours aftertransfection. Luciferase activity was measured with luciferase assayreagent (Promega), PGal activity with Galacto-Light (Tropix), in aMicroLumat LB96P luminometer (EG&G Berthold). Luciferase activity wasnormalized on βGal activity.

Example 8

Regulation of the erbB-2 Gene in Hela Cells

The erbB-2 gene was targeted for imposed regulation. The erbB-2 gene isfrequently overexpressed in human cancers, particularly breast andovarian, and elevated ErbB-2 levels correlate with a poor prognosis (N.E. Hynes and D. F. Stem, Biochim. Biophys. Acta 1198, 165 (1994)). Toregulate the native erbB-2 gene, a synthetic repressor protein,designated E2C-KRAB, and a transactivator protein, designated E2C-VP64,were utilized (R. R. Beerli, D. J. Segal, B. Dreier, C. F. Barbas, III,Proc. Natl. Acad. Sci. USA 95,14628 (1998)). Both proteins contain thesame designed zinc finger protein E2C that recognizes the 18-bp DNAsequence 5′-GGG GCC GGA GCC GCA GTG-3′ (SEQ ID NO:121) in the5′-untranslated region of the proto-oncogene erbB-2. This DNA-bindingprotein was constructed from 6 pre-defined and modular zinc fingerdomains (D. J. Segal, B. Dreier, R. R. Beerli, C. F. Barbas, III, Proc.Natl. Acad. Sci. USA 96, 2758 (1999)). The repressor protein containsthe Kox-1 KRAB domain (J. F. Margolin et al., Proc. Natl. Acad. Sci. USA91, 4509 (1994)), whereas the transactivator VP64 contains a tetramericrepeat of the minimal activation domain (K. Seipel, O. Georgiev, W.Schaffner, EMBO J. 11, 4961 (1992)) derived from the herpes simplexvirus protein VP16.

A derivative of the human cervical carcinoma cell line HeLa,HeLa/tet-off, was utilized (M. Gossen and H. Bujard, Proc. Natl. Acad.Sci. USA 89, 5547 (1992)). Since HeLa cells are of epithelial originthey express ErbB-2 and are well suited for studies of erbB-2 genetargeting. HeLa/tet-off cells produce the tetracycline-controlledtransactivator, allowing induction of a gene of interest under thecontrol of a tetracycline response element (TRE) by removal oftetracycline or its derivative doxycycline (Dox) from the growth medium.We have used this system to place our transcription factors underchemical control. Thus, the pRevTRE/E2C-SKD and pRevTRE/E2C-VP64plasmids were constructed (The E2C(Sp1)-KRAB and E2C(Sp1)-VP64 codingregions were PCR amplified from pcDNA3-based expression plasmids (R. R.Beerli, D. J. Segal, B. Dreier, C. F. Barbas, III, Proc. Natl. Acad.Sci. USA 95, 14628 (1998)) and subcloned into pRevTRE (Clontech) usingBamH1 and Cla1 restriction sites, and into pMX-IRES-GFP [X. Liu et al.,Proc. Natl. Acad. Sci. USA 94, 10669 (1997)] using BamH1 and Not1restriction sites. Fidelity of the PCR amplification was confirmed bysequencing), transfected into HeLa/tet-off cells, and 20 stable cloneseach were isolated and analyzed for Dox-dependent target gene regulation(The pRevTRE/E2C-KRAB and pRevTRE/E2C-VP64 constructs were transfectedinto the HeLa/tet-off cell line (M. Gossen and H. Bujard, Proc. Natl.Acad. Sci. USA 89, 5547 (1992)) using Lipofectamine Plus reagent (GibcoBRL). After two weeks of selection in hygromycin-containing medium, inthe presence of 2 μg/ml Dox, stable clones were isolated and analyzedfor Dox-dependent regulation of ErbB-2 expression. Western blots,immunoprecipitations, Northern blots, and flow cytometric analyses werecarried out essentially as described [D. Graus-Porta, R. R. Beerli, N.E. Hynes, Mol. Cell. Biol. 15,1182 (1995)]). As a read-out of erbB-2promoter activity, ErbB-2 protein levels were initially analyzed byWestern blotting. A significant fraction of these clones showedregulation of ErbB-2 expression upon removal of Dox for 4 days, i.e.downregulation of ErbB-2 in E2C-KRAB clones and upregulation in E2C-VP64clones. ErbB-2 protein levels were correlated with altered levels oftheir specific mRNA, indicating that regulation of ErbB-2 expression wasa result of repression or activation of transcription. The additionalErbB-2 protein expressed in E2C-VP64 clones was indistinguishable fromnaturally expressed protein and biologically active, since epidermalgrowth factor (EGF) readily induced its tyrosine phosphorylation. TheErbB-2 levels in the E2C-KRAB clone #27, in the absence of Dox, werebelow the level of detection as was its EGF-induced tyrosinephosphorylation. Therefore, ErbB-2 expression was also analyzed by flowcytometry, revealing no detectable ErbB-2 expression in E2C-KRAB clone#27, in sharp contrast to the dramatic upregulation (5.6 fold) of ErbB-2in E2C-VP64 clone #18. Thus, the extent of erbB-2 gene regulation rangedfrom total repression (E2C-KRAB clone #27) to almost 6-fold activation(E2C-VP64 clone #18). No significant effect on the expression of therelated ErbB-1 protein was observed, indicating that regulation ofErbB-2 expression was not a result of general down- or up-regulation oftranscription. In contrast to the efficacy of these transcriptionfactors that target 18 bps of DNA sequence using six zinc fingerdomains, transcriptional activators prepared with three zinc fingerdomains that bind either of the 9-bp half-sites of the E2C targetsequence were unable to activate transcription of an erbB-2-luciferasereporter. These results suggest that the increased specificity andaffinity of six finger proteins may be required to provide a dominanteffect on gene regulation.

Example 9

Introduction of the Coding Regions of the E2C-KRAB and E2C-VP64 Proteinsinto the Retroviral Vector pMX-IRES-GFP

In order to express the E2C-KRAB and E2C-VP64 proteins in several othercell lines, their coding regions were introduced into the retroviralvector pMX-IRES-GFP. The E2C(Sp1)-KRAB and E2C(Sp1)-VP64 coding regionswere PCR amplified from pcDNA3-based expression plasmids (R. R. Beerli,D. J. Segal, B. Dreier, C. F. Barbas, III, Proc. Natl. Acad. Sci. USA95, 14628 (1998)) and subcloned into pRevTRE (Clontech) using BamH1 andCla1 restriction sites, and into pMX-IRES-GFP [X. Liu et al., Proc.Natl. Acad. Sci. USA 94, 10669 (1997)3 using BamH1 and Not1 restrictionsites. Fidelity of the PCR amplification was confirmed by sequencing.This vector expresses a single bicistronic message for the translationof the zinc finger protein and, from an internal ribosome-entry site(IRES), the green fluorescent protein (GFP). Since both coding regionsshare the same mRNA, their expression is physically linked to oneanother and GFP expression is an indicator of zinc finger expression.Virus prepared from these plasmids was then used to infect the humancarcinoma cell line A431 (pMX-IRES-GFP/E2C-KRAB andpMX-IRES-GFP/E2C-VP64 Plasmids were transiently transfected into theamphotropic packaging cell line Phoenix Ampho using Lipofectamine Plus(Gibco BRL) and, two days later, culture supernatants were used forinfection of target cells in the presence of 8 μg/ml polybrene. Threedays after infection, cells were harvested for analysis. Three daysafter infection, ErbB-2 expression was measured by flow cytometry.Significantly, about 59% of the E2C-KRAB virus treated cells wereessentially ErbB-2 negative, while in about 27% of the E2C-VP64 virustreated cells ErbB-2 levels were increased. Plotting of GFP fluorescencevs. ErbB-2 fluorescence revealed that there were two cell populations,one with normal ErbB-2 levels that was GFP negative, and another withaltered ErbB-2 levels that was GFP positive. Specificity of genetargeting was investigated by measuring the expression levels of therelated ErbB-1 and ErbB-3 proteins. No significant alterations of theseprotein levels were detected, indicating that erbB-2 gene targeting isspecific and not a non-specific result of general alterations in geneexpression or overexpression of the effector domains. The lack of anyappreciable regulation of erbB-3 is particularly remarkable since its5′-UTR contains the 18 bp sequence 5′-GGa GCC GGA GCC GgA GTc-3′ (SEQ IDNO:122), that presents only 3 mismatches to E2C's designed targetsequence (15 bp identity—lowercase letters indicate differences) (M. H.Kraus, W. Issing, T. Miki, N. C. Popescu, S. A. Aaronson, Proc. Natl.Acad. Sci. USA 86, 9193 (1989)).

Example 10

Regulation of the erbB-2 Gene in Non-Human Primate Cells

The zinc finger target sequence within erbB-2's 5′-UTR lays within a28-bp sequence stretch that is conserved in many species. To investigateregulation of erbB-2 gene expression in non-human primate cells, COS-7fibroblasts were infected with the bicistronic E2C-KRAB retrovirus andanalyzed by flow cytometry. As in human cells, expression of therepressor protein as indicated by the GFP marker correlated well with aloss of ErbB-2 protein. Similarly, gene targeting in murine cells wasevaluated by infection of NIH/3T3 cells with E2C-KRAB and E2C-VP64encoding retrovirus. ErbB-2 expression levels were then monitored byWestern blotting rather than flow cytometry, due to a lack of reactivityof the mAb with the murine ErbB-2 extracellular domain. Here again, withE2C-KRAB a complete transcriptional knockout upon correction forinfected cells was observed. However, unlike in human cell lines,E2C-VP64 induced ErbB-2 upregulation was rather modest in NIH/3T3 cells,approximately 1.8 fold upon correction for infection efficiency. Alikely explanation for this discrepancy lies in the different structuresof the human and mouse promoters. The mouse erbB-2 promoter, unlike thehuman, does not contain a TATA box (M. R. White and M. C. Hung, Oncogene7, 677 (1992)). Transcriptional activation by VP16 is, at least in part,mediated by its interaction with TFIID, a multi-protein complex alsocontaining the TATA-binding protein (C. J. Ingles, M. Shales, W. D.Cress, S. J. Triezenberg, J. Greenblatt, Nature 351, 588 (1991)). It istherefore plausible that the E2C-VP64 protein activates transcriptionless effectively in the absence of a TATA box. These data suggest thatwhile a DNA binding site may be conserved with respect to sequence andrelative position within a target cell, effector domains may need to beoptimized for maximal efficiency due to context effects. Nevertheless,while their potencies may differ, the artificial transcription factorsdescribed here are capable of imposing regulation of erbB-2 genetranscription in cells derived from different species, providing astrategy for the study of gene function in a variety of organisms.

Example 11 Specific Induction of G1 Accumulation of ErbB-2Overexpressing Tumor Cells

Overexpression of ErbB-2 leads to constitutive activation of itsintrinsic tyrosine kinase activity (P. P. Di Fiore et al., Science 237,178 (1987)), and it has been shown that downregulation of ErbB-2 intumor cells overexpressing the receptor leads to growth inhibition (R.M. Hudziak et al., Mol. Cell. Biol. 9, 1165 (1989); J. Deshane et al.,Gene Ther. 1, 332 (1994); J. M. Daly et al., Cancer Res. 57, 3804(1997)). The mechanism of growth inhibition appears to be thatprogression of the cells from the G1 to the S phase of the cell cycle isprevented (R. M. Neve, H. Sutterluty, N. Pullen, H. A. Lane, J. M. Daly,W. Krek, N. E. Hynes, Submitted for publication). Thus, we investigatedif expression of our designed transcriptional repressor in erbB-2overexpressing tumor cells would lead to a G1 block. Therefore, SKBR3breast cancer cells were infected with E2C-KRAB retrovirus andcell-cycle distribution was analyzed in relation to ErbB-2 expressionlevels by flow cytometry (22). Two cell populations were observed: about40% of the cells were not infected and had normal ErbB-2 levels, whilethe infected cells, .about.60%, displayed approximately 7-fold reducedreceptor levels after 3 days. Compared to cells with normal receptorlevels, a significantly larger fraction of cells with decreased ErbB-2expression levels was in the G1 phase of the cell cycle. To ascertainthat the G1 accumulation observed with SKBR3 cells was specific forErbB-2 overexpressing tumor cells, a similar analysis was carried outwith the T47D breast cancer cell line, which does not display elevatedlevels of ErbB-2 (FIG. 4B). Indeed, when T47D cells were infected withthe E2C-KRAB retrovirus and subjected to flow cytometric analysis, cellpopulations with normal and reduced ErbB-2 levels were found to displayindistinguishable DNA content. Thus, our designed repressor protein isable to specifically induce G1 accumulation of ErbB-2 overexpressingtumor cells. The ability to inhibit cell-cycle progression, and henceinhibit growth of ErbB-2 overexpressing tumor cells suggests thepotential of designed transcription factors for cancer gene therapy.

Example 12

Studies with erbB-3

Construction and Characterization of a Polydactyl Protein for Regulationof the erbB-3 Gene. Examination of the erbB-3 5′-UTR revealed thepresence of an 18-bp sequence that was highly similar to the E2C targetsequence in the erbB-2 5′-UTR (FIG. 2). Although they are at differentdistances and orientations with respect to the ATG initiation codons,the two sequences differ by only three nucleotides. Thus, a six-fingerprotein recognizing this sequence was made to investigate whethertranscription factors could be designed to selectively regulate erbB-3gene expression.

Described herein before are several strategies for the construction ofpolydactyl proteins from defined, modular building blocks. The mostsuccessful strategy involved grafting of the amino acid residues of eachzinc finger involved in base-specific DNA recognition (a short a-helicalregion referred to as the “recognition helix”) into the framework of thedesigned consensus protein Sp1C, a derivative of the transcriptionfactor Spl. Thus, the six-finger protein E3 designed to bind the 18-bperbB-3 target sequence was built by using the Sp1C helix graftingstrategy, the same method used for construction of the E2C proteindescribed herein. An alignment of the E2C and E3 proteins revealsextensive sequence identity (FIG. 2). In particular, the entire proteinframework, as well as three of the six recognition helices, areidentical. Only the recognition helices of fingers 1, 2, and 6 werepartially different, reflecting the fact that the 3-bp subsitesrecognized by these fingers differed by 1 nucleotide each.

For a detailed analysis of its binding properties, the E3 protein waspurified as a fusion with the maltose-binding protein. Initially, anELISA analysis was carried out, revealing specific binding of the E3protein to its target site, with little or no crossreactivity to variousother 5′-(GNN)₆-3′ DNA sequences. A similar observation was made withthe E2C protein. However, because of the similarity of the DNA sequencesrecognized, some crossreactivity of the two proteins with each other'starget site was detected. To obtain a quantitative measure for theextent of discrimination between target and nontarget sequence, theaffinities of the two proteins to each target sequence was determined byelectrophoretic mobility-shift assay.

These studies revealed high-affinity binding of the E3 protein to itstarget, with a K_(d) value of 0.35 nM (±10%), whereas the affinity ofbinding to the E2C target sequence was about 30-fold lower, with a K_(d)value of 10 nM (±15%). Similarly, the affinity of the E2C protein to itstarget was subnanomolar, with a K_(d) value of 0.75 nM (±15%), whereasbinding to the E3 site was significantly weaker, with a K_(d) value of11 nM (±30%). Thus, both the E2C and the E3 proteins bind theirrespective target sequence with very high affinity and are able todiscriminate between their cognate and very closely related DNAsequences.

Designed transcription factors were generated by fusing the E3 proteinto repression or activation domains. In a manner analogous to the E2Cfusion constructs, the E3-KRAB protein was produced by fusing the KRABrepressor domain to E3's N terminus, while E3-VP64 was generated byfusing the synthetic VP64 transactivation domain to its C terminus.

To analyze the ability of the erbB-3-specific transcription factors toimpose a dominant regulatory effect on the native erbB-3 gene, theE3-KRAB and E3-VP64 coding regions were introduced into the retroviralvector pMX-IRES-GFP. Retroviruses prepared from this vector were thenused to infect A431 cells. Three days after infection, expression levelsof various members of the ErbB receptor family were monitored by flowcytometry.

Dramatic alterations in the levels of ErbB-3 were detected insignificant fractions of infected cell populations. Expression wasabolished in 74% of E3-KRAB virus-infected cells, whereas almost 8-foldhigher ErbB-3 levels were detected in 48% of E3-VP64 virus-infectedcells. Plotting of ErbB-3 fluorescence against GFP fluorescence revealedthat only GFP-positive, i.e., infected, cells displayed altered ErbB-3levels. Thus, E3-based transcription factors are as potent as E2C-basedtranscription factors in regulating target gene expression.

In contrast to the efficient regulation of ErbB-3 expression, neitherE3-KRAB nor E3-VP64 affected ErbB-1 and ErbB-2 expression levels. Giventhe similarity of the E3 and E2C target sequences, the lack of asignificant effect on erbB-2 gene expression is yet anotherdemonstration of the exquisite specificity inherent to the zincfinger-based gene switches described here.

1-39. (canceled)
 40. A non-naturally occurring polypeptide which is azinc finger nucleotide binding polypeptide, which non-naturallyoccurring polypeptide binds a nucleotide sequence of the form5′-(GNN)-3′ as a result of the presence of one or more nucleotidesequence binding regions QS/PS/GN/D/T/H/S/ELVR, DP/SGN/D/E/T/H/K/A/SLVR,R/KSP/AN/Q/T/D/K/H/V/A/SLVR, or TS/PGN/E/T/D/K/H/SLVR.


41. The polypeptide of claim 40 which has two or more of the nucleotidesequence binding regions.
 42. The polypeptide of claim 41 which has twoto six of the nucleotide sequence binding regions.
 43. The polypeptideof claim 41 which has two to twelve of the nucleotide sequence bindingregions.
 44. The polypeptide of claim 40 wherein the nucleotide sequencebinding region is selected from the group consisting of: QSSNLVR, (SEQID NO: 1) DPGNLVR, (SEQ ID NO: 2) TSGNLVR, (SEQ ID NO: 4) TSGELVR, (SEQID NO: 8) DPGHLVR, (SEQ ID NO: 10) TSGHLVR, (SEQ ID NO: 12) QSSSLVR,(SEQ ID NO: 13) DPGALVR, (SEQ ID NO: 14) TSGSLVR, (SEQ IS NO: 16)QSGNLVR; (SEQ ID NO: 18) QPGNLVR; (SEQ ID NO: 19) KSANLVR; (SEQ ID NO:22) QSSTLVR; (SEQ ID NO: 25) QPGDLVR; (SEQ ID NO: 27) QPGTLVR; (SEQ IDNO: 30) TPGELVR; (SEQ ID NO: 49) TSGDLVR; (SEQ ID NO: 50) QSSDLVR; (SEQID NO: 54) TPGTLVR; (SEQ ID NO: 56) TSGTLVR; (SEQ ID NO: 58) QSSHLVR;(SEQ ID NO: 59) QSGHLVR; (SEQ ID NO: 60) QPGHLVR; (SEQ ID NO: 61)TPGHLVR; (SEQ ID NO: 73) TSGKLVR; (SEQ ID NO: 75) QPGELVR; (SEQ ID NO:76) QSGELVR; (SEQ ID NO: 77) DPGSLVR; (SEQ ID NO: 79) RSASLVR (SEQ IDNO: 91) KSASLVR; (SEQ ID NO: 97) KSAALVR; (SEQ ID NO: 98) and TSGELVR.(SEQ ID NO: 107)


45. The polypeptide of claim 40 further comprising a transcriptionregulating polypeptide.
 46. The polypeptide of claim 45 wherein thetranscription regulating polypeptide activates transcription.
 47. Thepolypeptide of claim 45 wherein the transcription regulating polypeptideinhibits transcription.
 48. A method to prepare a vector encoding anon-naturally occurring polypeptide which is a zinc finger nucleotidebinding polypeptide, comprising: a) selecting a heptapeptide that binds5′-GNA-3′, 5′-GNC-3′, 5′-GNG-3′, or 5′-GNT-3′ in a nucleotide sequenceof at least 9 base pairs; and b) preparing an expression vector thatencodes a non-naturally occurring polypeptide having the selectedheptapeptide sequence.
 49. The method of claim 48 wherein the selectedheptapeptide is QS/PS/GN/D/T/H/S/ELVR, DP/SGN/D/E/T/H/K/A/SLVR,R/KSP/AN/Q/T/D/K/H/V/A/SLVR, or TS/PGN/E/T/D/K/H/SLVR


50. A method to detect or determine binding of a non-naturally occurringpolypeptide to 5′-GNA-3′, 5′-GNC-3′, 5′-GNG-3′, or 5′-GNT-3′ in thegenome of a cell, comprising: a) introducing to a host cell anexpression vector that encodes a non-naturally occurring polypeptidehaving a heptapeptide that binds 5′-GNA-3′, 5′-GNC-3′, 5′-GNG-3′, or5′-GNT-3′ in a nucleotide sequence of at least 9 base pairs; and b)detecting or determining whether the non-naturally occurring polypeptidebinds a nucleotide sequence in the genome of the cell that has5′-GNA-3′, 5′-GNC-3′, 5′-GNG-3′, or 5′-GNT-3′.
 51. The method of claim50 wherein the selected heptapeptide is QS/PS/GN/D/T/H/S/ELVR,DP/SGN/D/E/T/H/K/A/SLVR, R/KSP/AN/Q/T/D/K/H/V/A/SLVR, orTS/PGN/E/T/D/K/H/SLVR


52. A method to control expression of sequences 3′ to a nucleotidesequence of at least 9 base pairs that includes 5′-GNA-3′, 5′-GNC-3′,5′-GNG-3′, or 5′-GNT-3′ in the genome of a cell, comprising: expressingin a host cell an expression vector that encodes a non-naturallyoccurring polypeptide having a heptapeptide that binds 5′-GNA-3′,5′-GNC-3′, 5′-GNG-3′, or 5′-GNT-3′ in a nucleotide sequence of at least9 base pairs in an amount effective to regulate expression of nucleicacid linked 3′ to the nucleotide sequence and present in the genome ofthe host cell.